SAFARI Conference 2025

1st SAFARI Conference 2025

Workshop Description

SAFARI Conference is a full-day workshop, where former SAFARI researchers reunite and share their cutting-edge research in critical domains, including but not limited to Memory Robustness, Processing-In-Memory and Storage Systems, their intersection with AI applications and accelerators as well as Health Analytics.

The workshop is held in-person at ETH Zurich and livestreamed online.

Join us in-person or online to learn, share, and contribute to shaping the future of computer architecture, data-centric systems and specialized architectures for AI and health analytics.

Date: December 15, 2025
Location: HG D16.2, ETZ E7, ETH Zurich
Livestream: YouTube

Program at a Glance

Time	Speaker	Affiliation	Title	Location	Livestream
08:30	Minesh Patel	Rutgers University	Split Write DRAM: Reducing DRAM Access Latency by Mitigating Read-Write Interference	HG D16.2	YouTube Slides [pdf] Slides [ppt]
09.05	Abdullah Giray Yaglikci	CISPA	Making DRAM Available Again: Safely Mitigating RowHammer at Low Performance and Energy Overheads in Modern DRAM-based Systems	HG D16.2	YouTube Slides [pdf] Slides [ppt]
09:40	Yu (Lenny) Liang	Inria-Paris	Co-Designing Operating Systems and Processing-in-Memory for Next-Generation Mobile Devices	HG D16.2	YouTube Slides [ppt]
10:15		Coffee Break
10:45	Can Firtina	University of Maryland, College Park	Enabling Real-Time Analysis of Human Genomes via New Algorithms and Architectures	HG D16.2	YouTube Slides [pdf] Slides [ppt]
11:20	Jawad Haj-Yahya	Rivos Inc.	Feeding the Beast: The Critical Challenges of Powering Next-Gen AI accelerators	HG D16.2	YouTube Slides [ppt]
11:55	Haiyu Mao	King’s College London	From Microarchitectures to Systems: Designing Holistic Data-Centric Architectures for AI Applications	HG D16.2	YouTube Slides [ppt]
12:30		Lunch Break
14:30	Juan Gómez Luna	NVIDIA	Accelerating Access to Data Storage and Services in the Age of AI	ETZ E7	YouTube
15:05	Lois Orosa	Galicia Supercomputing Center (CESGA)	Computer Architecture Research in the Galicia Supercomputing Center: Challenges and Opportunities	ETZ E7	YouTube
15:30	Christina Giannoula	Max Planck Institute for Software Systems (MPI-SWS)	Building the Machine Learning Software Stack for Processing-In-Memory Architectures	ETZ E7	YouTube Slides [ppt]
16:15		Coffee Break
16:30	Geraldo F. Oliveira	SAFARI Research Group, ETH Zurich	New Tools, Programming Models, and System Support for Processing-in-Memory Architectures	ETZ E7	YouTube
17:05	Saugata Ghose	University of Illinois Urbana-Champaign	Multi-Domain Architectures for Processing-Using-Memory	ETZ E7	YouTube Slides [pdf]
17:40	Nastaran Hajinazar	Intel Labs	Virtual Memory Management: The XPU perspective	ETZ E7	YouTube
18:05	Gagandeep Singh	AMD	The Convergence of AI, Supercomputing, and Biology: A View from AMD	ETZ E7	YouTube

Program Details

08:30 | Split Write DRAM: Reducing DRAM Access Latency by Mitigating Read-Write Interference

Speaker: Minesh Patel, Rutgers University

Abstract

Short Bio

Minesh Patel is an assistant professor in the Computer Science Department at Rutgers University. He holds a DSc.in electrical and computer engineering from ETH Zürich, where he was supervised by Prof. Onur Mutlu, and dual B.S. degrees in ECE and physics from the University of Texas at Austin. He has been recognized with an ETH medal, the William C. Carter dissertation award, and induction into the ISCA Hall of Fame. Several of his works have been winners or finalists for best paper awards. His current research interests span the intersection fo computer architecture and dependability, focusing on memory/processor reliability, fairness, and quality of service. https://www.mineshp.com/

09:05 | Making DRAM Available Again: Safely Mitigating RowHammer at Low Performance and Energy Overheads in Modern DRAM-based Systems

Speaker: Abdullah Giray Yaglikci, CISPA

Abstract

Read disturbance in modern DRAM is an important robustness (security, safety, and reliability) problem, where repeatedly accessing (hammering) a row of DRAM cells (DRAM row) induces bitflips in other physically nearby DRAM rows. To make matters worse, shrinking technology node size exacerbates DRAM read disturbance in circuit level over generations. Worsenning DRAM read disturbance leads to data integrity issues and existing read disturbance mitigations greatly reduce the availability of DRAM chips. In this talk, I will cover two of our recent works: Chronus and BreakHammer, tackling data integrity issues caused by DRAM read disturbance at low overhead on DRAM chips’ availability.

Chronus is an in-DRAM RowHammer mitigation mechanism, proposing a better implementation of the RowHammer mitigation approach in the latest DDR5 specs, PRAC. Chronus identifies two major shortcomings of PRAC and addresses them via leveraging subarray-level parallelism and dynamically increasing the refresh count when needed. Our evaluation shows that Chronus significantly reduces PRAC’s performance and energy overheads, and achieves near-zero performance overhead for modern DRAM chips, outperforming three variants of PRAC and three other state-of-the-art read disturbance solutions. To aid future research, we open-source our Chronus implementation at https://github.com/CMU-SAFARI/Chronus.

BreakHammer is a throttling mechanism that works with existing RowHammer mitigation mechanisms and reduces the number of RowHammer-preventive actions they need to perform. To achieve this, BreakHammer identifies the hardware threads that abnormally trigger many RowHammer-preventive actions, and reduces their memory bandwidth usage via reducing the number of cache miss registers that they can allocate at the last-level cache. By doing so, BreakHammer reduces the memory traffic created by malicious threads, resulting in higher system throughput, fairness, and energy efficiency. BreakHammer has a simplistic implementation in the memory controller with near-zero area overhead. To foster further research we open-source our BreakHammer implementation and scripts at https://github.com/CMU-SAFARI/BreakHammer.

Short Bio

Giray is a tenure-track faculty at CISPA and looking forward to hiring young researchers at various levels. His broader research interests span high-performance, energy-efficient, and secure computer architectures aiming robustly and sustainably scalable systems. His research is published in major venues, including HPCA, MICRO, DSN, ISCA, and USENIX Security. Giray received his PhD from ETH Zurich, SAFARI Research Group, in 2024, advised by Professor Onur Mutlu. Giray’s PhD research builds 1) a detailed understanding of DRAM read disturbance, a major limitation of main memory density scaling, and 2) mechanisms that efficiently and scalably mitigate DRAM read disturbance. His PhD dissertation was awarded with ACM SIGMICRO Dissertation Award and William C. Carter PhD Dissertation Award in Dependability in 2025 and received an honorable mention by ACM SIGARCH/IEEE CS TCCA Outstanding Dissertation Award. Giray’s PhD research is in part 1) supported by Google Security and Privacy Research Award and Microsoft Swiss Joint Research Center and 2) recognized by ETH Medal (nominated), Intel Hardware Security Academic Award 2022 (chosen as a finalist), and ACM PACT Student Research Competition 2023 (won the first place). https://agyaglikci.github.io

09:40 | Co-Designing Operating Systems and Processing-in-Memory for Next-Generation Mobile Devices

Speaker: Yu (Lenny) Liang, Inria Paris

Abstract

Modern mobile devices increasingly execute data-intensive workloads such as generative AI, large language models (LLMs), augmented reality, and high-definition multimedia processing. These workloads induce substantial data movement across memory, storage, and heterogeneous processors, often dominating system energy consumption and limiting responsiveness. At the same time, long-standing inefficiencies in the Linux kernel, including memory fragmentation and hardware-agnostic OS abstractions, prevent current systems from fully exploiting emerging accelerators such as Processing in Memory (PIM). This talk presents a research direction that rethinks the boundary between operating systems and hardware to address these challenges. The central idea is to co-design OS mechanisms with architectural capabilities, aligning memory allocation, scheduling, and data-placement policies with low-level features such as TLB mappings, DRAM-controller behavior, and near-memory execution units. Such cross-layer co-design can greatly reduce unnecessary data movement, improve energy efficiency, and unlock the potential of accelerators for mobile AI workloads. By integrating systems and architecture perspectives, our research aims to establish general principles for future mobile and edge systems–ones that are efficient, scalable, and capable of fully leveraging next-generation hardware.

Short Bio

Yu Liang is a researcher in the Whisper team at Inria Paris, specializing in operating systems and hardware–software co-design. Her work focuses on optimizing mobile systems across the OS, memory, file system, and storage stack, with an emphasis on reducing data movement and improving energy efficiency for data-intensive workloads. Before joining Inria Paris, she was a Senior Researcher and Lecturer in the SAFARI group supervised by Prof. Onur Mutlu at ETH Zurich. She received her PhD from the City University of Hong Kong. She has published in top-tier venues such as HPCA, ASPLOS, EuroSys, USENIX ATC, and FAST, and her research has influenced commercial systems, including contributions integrated into the Linux kernel and deployed across millions of mobile devices. Her recent work explores OS–architecture co-design for emerging technologies such as Processing-in-Memory to support generative AI on mobile platforms. https://yulennyliang.github.io

Coffee Break

10:45 | Enabling Real-Time Analysis of Human Genomes via New Algorithms and Architectures

Speaker: Can Firtina, University of Maryland, College Park

Abstract

Analyzing biological data provides critical insights for understanding and treating diseases, personalized medicine, outbreak tracing, evolutionary studies, and agriculture. Modern genome sequencing devices can rapidly generate large amounts of genomic data at a low cost. However, genome analysis is significantly impacted by the computational and data movement overheads of existing computing systems and algorithms, causing significant limitations in terms of speed, accuracy, application scope, and energy efficiency of the analysis.

This talk focuses on designing algorithms and hardware to address these computational limitations in biological data analysis. First, we discuss how to take a fundamentally different approach to genomic data analysis by directly analyzing electrical signals generated by sequencing devices, without converting them into DNA characters. Second, we show how direct analysis of these electrical signals provides us with opportunities to exploit emerging computing paradigms, such as in-memory computing, to perform real-time and energy-efficient analysis directly on edge devices. We conclude by touching on the potential for biological data analysis that can be performed anywhere, anytime, and by anyone to enable fundamentally new applications in medicine and genomics.

Short Bio

Can Firtina is an Assistant Professor in the Department of Computer Science at the University of Maryland, College Park (UMD). He leads the STORM Research Group at UMD. His research focuses on algorithms and computing systems for bioinformatics.

His interests span bioinformatics, artificial intelligence, and computer architecture. He works on a broad range of problems to address fundamental challenges in computer science and computational biology. To this end, he designs methods and builds systems that use emerging computing paradigms and memory technologies to enable fast, accurate, energy-efficient, and real-time analysis of biological data. He is also interested in developing solutions for applications such as genome editing. Can Firtina has been awarded the ETH Doctoral Medal Prize in 2025. His research has been published in major bioinformatics and computer architecture venues.

Previously, Can Firtina was a Senior Researcher and Lecturer at ETH Zurich, where he taught courses on accelerating genome analysis with hardware–algorithm co-design. His lecture videos and materials are available on YouTube. He received his PhD from ETH Zurich, advised by Prof. Onur Mutlu in the SAFARI Research Group. https://www.cs.umd.edu/~firtina

11:20 | Feeding the Beast: The Critical Challenges of Powering Next-Gen AI accelerators

Speaker: Jawad Haj-Yahya, Rivos inc.

Abstract

Modern CPUs, GPUs, and AI accelerators are no longer limited solely by transistor density; they are increasingly bottlenecked by the physics of power delivery. With current demands in high-end AI accelerators already exceeding 1,000 Amperes (1kA) — a figure set to rise significantly in future generations — the ability to deliver clean, stable power has become a paramount challenge. This talk traces the journey of energy from the datacenter’s infrastructure down to the final metal layers of a chip, focusing on the “last few centimeters” of the power delivery network (PDN) where the most significant efficiency and performance losses occur. We will examine package-level resistance, on-die IR drop, and fast di/dt transients to understand how voltage droop directly impacts system performance. Finally, the lecture highlights the importance of co-design between microarchitects and power engineers, exploring how adaptive circuits, predictive mechanisms, and emerging technologies such as backside power delivery (PowerVia/BPR) are reshaping the future of silicon PDN.

Short Bio

Dr. Jawad Haj-Yahya is a Principal Architect at Rivos Inc. with over two decades of experience in computer architecture and power management. A recipient of the prestigious Intel Achievement Award, he has held senior research and architectural roles at Intel, Huawei, ETH Zurich, and NTU Singapore. Dr. Haj-Yahya specializes in energy-efficient systems and hardware security, holds over 10 US patents, and is a published author in leading journals and conferences, including ISCA, MICRO, and HPCA.

11:55 | From Microarchitectures to Systems: Designing Holistic Data-Centric Architectures for AI Applications

Speaker: Haiyu Mao, King’s College London

Abstract

As data-intensive applications, such as large language models that increasingly power everyday services, continue to proliferate, they demand massive data processing capabilities. However, executing these workloads on traditional von Neumann architectures incurs substantial data movement between processors (CPUs, GPUs) and memory, leading to high latency and significant energy consumption. Data-centric architectures, such as processing-in-memory (PIM), offer a promising solution by integrating processing units closer to memory (processing near memory) or enabling memory arrays to directly perform computation (processing using memory). Despite PIM’s potential, many existing research works have focused on accelerating isolated memory-intensive operators, leaving the broader challenge of supporting end-to-end, real-world AI applications with PIM architectures largely open.

This talk explores how to efficiently leverage PIM for practical applications through the design of holistic data-centric architectures. This talk will begin by introducing PIM and outlining the challenges of its real-world adoption. It will then introduce a novel PIM-enabled system for large-language-model decoding that features dynamic parallelism. The talk will conclude with a discussion of future research directions for data-centric architectures and their application to real-world AI challenges.

Short Bio

Dr. Haiyu Mao is an Assistant Professor in the Department of Engineering at King’s College London, where she leads the SAIL research group. The mission of SAIL is to co-design high-performance, energy-efficient software and hardware to power data-intensive AI and bioinformatics, ultimately fostering a healthier, simpler everyday life. Her core research expertise lies at the intersection of computer architecture, processing-in-memory/storage, machine learning acceleration, bioinformatics, non-volatile memory, and secure memory. Prior to joining KCL, Dr. Mao was a Postdoctoral Researcher in the SAFARI Research Group at ETH Zurich, supervised by Prof. Onur Mutlu. She earned her Ph.D. in Computer Science from Tsinghua University under the supervision of Prof. Jiwu Shu. https://hybol1993.github.io

Lunch Break

14:00 | Accelerating Access to Data Storage and Services in the Age of AI

Speaker: Juan Gómez Luna, NVIDIA

Abstract

Access to data storage and data services for GPUs has traditionally relied on the host CPU. While this approach might still work efficiently for workloads with predictable data access and datasets that can be evenly partitioned, there are emerging applications (e.g., graph and data analytics, graph neural networks, recommender systems) that have a more irregular and data-dependent behavior. With the traditional approach, these workloads suffer from CPU-GPU synchronization, I/O traffic amplification, and long CPU latencies. In this talk, we will introduce novel approaches to GPU-initiated access to storage and services, which can efficiently support these emerging applications.

Short Bio

Juan Gómez Luna has been a senior research scientist at NVIDIA since 2023. He received the BS and MS degrees in Telecommunication Engineering from the University of Sevilla, Spain, in 2001, and the PhD degree in Computer Science from the University of Córdoba, Spain, in 2012. Between 2005 and 2017, he was a faculty member of the University of Córdoba. Between 2017 and 2023, he worked as a senior researcher and lecturer at professor Onur Mutlu's SAFARI Research Group at ETH Zürich. His research interests focus on GPU and heterogeneous computing, networking, memory and storage systems, processing-in-memory, and hardware and software optimization. He is the lead author of PrIM (https://github.com/CMU-SAFARI/prim-benchmarks), the first publicly-available benchmark suite for a real-world processing-in-memory architecture, and Chai (https://github.com/chai-benchmarks/chai), a benchmark suite for heterogeneous systems with CPU/GPU/FPGA.

14:35 | Computer Architecture Research in the Galicia Supercomputing Center: Challenges and Opportunities

Speaker: Lois Orosa, Galicia Supercomputing Center (CESGA)

Abstract

European Supercomputing centers contribute to the advancement of science and technology by providing large, high-performance supercomputing infrastructure to a wide range of researchers in many areas. Many of these supercomputers are not specialised, which might lead to inefficient execution of important workloads. In addition, there is increasing demand for more trustworthy and reliable infrastructure, especially for applications that handle sensitive or private data. In this talk, we introduce the vision, research plans, projects, and opportunities of the Galicia Supercomputing Center (CESGA) to build high-performance, energy-efficient, secure, and reliable computer architectures that will run some of the emerging workloads that may drive science breakthroughs in the coming years.

Short Bio

Lois Orosa is the Scientific Director of the Galicia Supercomputing Center (CESGA), Spain, where he holds the prestigious Ramón y Cajal tenure track contract for developing his research. Before that, Lois Orosa was a senior researcher at SAFARI Research group @ ETH Zürich, Switzerland. He received his BS and MS degrees in Telecommunication Engineering from the University of Vigo, Spain, his PhD degree from the University of Santiago de Compostela, Spain, and he held a postDoc position in the University of Campinas, Brazil. He was a visiting researcher at multiple companies (IBM, Recore Systems, Xilinx and Huawei) and universities (UIUC and Universidade Nova de Lisboa). His current research interests are in computer architecture, hardware security, reliability, memory systems, and AI accelerators. For more information, please see his webpage at https://loisorosa.github.io/

15:00 | Building the Machine Learning Software Stack for Processing-In-Memory Architectures

Speaker: Christina Giannoula, Max Planck Institute for Software Systems (MPI-SWS)

Abstract

Processing-in-Memory (PIM) architectures integrate compute cores close to or within memory arrays, emerging as a promising paradigm to accelerate memory-intensive kernels in modern Machine Learning (ML) models. While ML models contain both compute-intensive and memory-intensive kernels, the latter are often bottlenecked by limited memory bandwidth in modern CPU and GPU systems. Therefore, industry manufacturers and researchers have extensively explored PIM devices and their integration with host CPU/GPU systems to enable efficient end-to-end ML model execution. However, fully leveraging PIM's benefits for ML applications requires designing an effective system software stack specifically tailored for PIM architectures.

This talk explores how specialized libraries, system software, and compilers can unlock the potential of PIM architectures for machine learning workloads. First, I will present PyGim, a novel Graph Neural Network (GNN) library designed specifically for PIM systems, which optimizes memory-intensive GNN kernels through intelligent parallelization strategies. To aid future research, we open-source our PyGim implementation at https://github.com/CMU-SAFARI/PyGim. Second, I will introduce DCC, the first data-centric ML compiler for PIM architectures that supports diverse ML kernels across different PIM backends.

Our research aims to design a complete system software stack that is fundamentally data-centric—treating data movement as a first-class concern to achieve optimal performance—while being tailored for emerging ML models and supporting various PIM backends. The PIM software stack remains in its early stages, with critical components such as programming languages, runtime engines, and optimization frameworks still requiring substantial research.

Short Bio

Christina Giannoula is starting as a Tenure-Track Faculty member at the Max Planck Institute for Software Systems (MPI-SWS). She leads the SPIN research group at MPI-SWS. She is actively seeking motivated students and researchers to join her team.

Her research interests lie at the intersection of computer architecture, computer systems, high-performance computing, and sustainable computing. Her current research focuses on the hardware/software co-design of emerging applications, particularly AI/ML, with modern computing systems. She designs solutions across the entire system stack, from software down to hardware—including algorithms, compilers, runtime systems, programming frameworks, and hardware engines—leveraging cutting-edge technologies such as processing-in-memory and resource disaggregation. Her work targets improvements in performance, scalability, programmability, and sustainability.

Before joining MPI-SWS, she was a Postdoctoral Researcher at the University of Toronto, where she received several research distinctions, including postdoctoral research awards from the Vector Institute for Artificial Intelligence. She was selected as a 2024 MLSys Rising Star and 2024 EECS Rising Star. She received my Ph.D. from the School of Electrical and Computer Engineering (ECE) at the National Technical University of Athens (NTUA) in Greece, where she was a member of the Computing Systems Laboratory. During her PhD studies, whe worked in the SAFARI Research Group for more than two years (one year in Zurich) and Onur Mutlu was her co-advisor. Her Ph.D. thesis received the 2022 Iakovos Gurounian Award for the Ph.D. thesis with the highest industrial impact. During her Ph.D. studies, she also received a Ph.D. award from the Foundation for Education and European Culture from 2020 to 2021, and a Ph.D. Fellowship from the General Secretariat for Research and Technology (GSRT) and the Hellenic Foundation for Research and Innovation (HFRI) from 2017 to 2020. She holds an M.Eng. equivalent degree from ECE NTUA, where she graduated in the top 2% of her class. https://people.mpi-sws.org/~cgiannoula/pages/index.php

Coffee Break

16:00 | New Tools, Programming Models, and System Support for Processing-in-Memory Architectures

Speaker: Geraldo F. Oliveira, SAFARI Research Group, ETH Zurich

Abstract

Continuously increasing data intensiveness of modern applications has led to high performance and energy costs for data movement in traditional processor-centric computing systems. To mitigate these costs, the processing-in-memory (PIM) paradigm moves computation closer to where the data resides, reducing (and sometimes eliminating) the need to move data between memory and the processor. There are two main approaches to PIM: (i) processing-near-memory (PNM), where PIM logic is added to the same die as memory or to the logic layer of 3D-stacked memory, and (ii) processing-using-memory (PUM), which uses the operational principles of memory cells and memory circuitry to perform computation. Many works from academia and industry have shown the benefits of PNM and PUM for a wide range of workloads from different domains. However, fully adopting PIM in commercial systems is still very challenging due to the lack of tools as well as programming and system support for PIM architectures across the computer architecture stack.

To this end, we aim to ease the adoption of PIM in current and future systems by providing tools, programming models, and system support for PIM architectures (with a focus on DRAM-based solutions). To this end, we make four new major contributions. First, we introduce DAMOV, the first rigorous methodology to characterize memory-related data movement bottlenecks in modern workloads, and the first data movement benchmark suite. Second, we introduce MIMDRAM, a new hardware/software co-designed substrate that addresses the major current programmability and flexibility limitations of the bulk bitwise execution model of processing-using-DRAM (PUD) architectures. MIMDRAM enables the allocation and control of only the needed computing resources inside DRAM for PUD computing. MIMDRAM implements a series of compiler passes that automatically identify and map code regions to the underlying PUD substrate, alongside system support for data mapping and allocation of PUD memory objects. Third, we introduce Proteus, the first hardware framework that addresses the high execution latency of bulk bitwise PUD operations in state-of-the-art PUD architectures by implementing a data-aware runtime engine for PUD. Fourth, we introduce DaPPA (data-parallel processing-in-memory architecture), a new programming framework that eases programmability for general-purpose PNM architectures by allowing the programmer to write efficient PIM-friendly code without the need to manage hardware resources explicitly.

Overall, our four major contributions demonstrate that we can effectively exploit the inherent parallelism of PIM architectures and facilitate their adoption across a broad spectrum of workloads through end-to-end design of hardware and software support (i.e., workload characterization methodologies and benchmark suites, execution and programming models, compiler support and programming frameworks, and adaptive data-aware runtime mechanisms) for PIM, thereby enabling orders of magnitude improvements in performance and energy efficiency across a wide variety of modern workloads.

Short Bio

Geraldo F. Oliveira earned his Ph.D. in Computer Science from ETH Zurich, where he was advised by Prof. Onur Mutlu. His work sits at the intersection of computer architecture and systems, with a special focus on memory-centric designs that push the limits of performance and energy efficiency. During his doctoral studies, he leveraged emerging memory technologies to speed up diverse workloads and to build system-level support for next-generation memory hierarchies. Geraldo’s research has been published in leading venues such as HPCA, ASPLOS, ISCA, MICRO, ICS, and IEEE Micro. In the last two years, he has also organized multiple workshops and tutorials at top computer-architecture conferences, helping shape the community’s dialogue on memory-centric computing systems.For more information, please visit his website at https://geraldofojunior.github.io

16:35 | Multi-Domain Architectures for Processing-Using-Memory

Speaker: Saugata Ghose, University of Illinois Urbana-Champaign

Abstract

With the explosion of data processing in modern applications, there has been a concerted effort to design new computer architectures that overcome the limitations of von Neumann computing. An emerging focus of these efforts is processing-using-memory (PUM), or in-memory computing, which attempts to eliminate most CPU–memory data movement bottlenecks by repurposing memory arrays to directly perform computational operations. Perhaps unsurprisingly, the majority of recent work on PUM has focused on accelerating machine learning, but this concentration has neglected the many other application domains that demonstrate the potential for large benefits with PUM.

In this talk, I will discuss my research group's efforts on designing PUM-based systems that have multi-domain applicability. Starting from an overview of how to perform application-driven PUM design, I will discuss two of our recent architectures. The RACER architecture shows how with Boolean-based wide vector PUM operations designed for general-purpose computing, we can deliver over 100x improvements in both performance and energy across many application domains, compared to a state-of-the-art CPU. Notably, RACER's architecture can be used on top of a wide range of memory technologies. The ANVIL architecture shows how with bulk associative search PUM operations, we can enable PUM operations for large data sets that exceed the capacities of most main memories, by performing PUM in solid-state drives. ANVIL's search-based PUM can accelerate any name–value data pair data structure, making it highly versatile. With both RACER and ANVIL, I will illustrate why cross-stack integration is essential to the future of PUM architectures.

Short Bio

Saugata Ghose is an assistant professor in the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign, where he leads the ARCANA Research Group. He holds M.S. and Ph.D. degrees in electrical and computer engineering from Cornell University, and dual B.S. degrees in computer science and in computer engineering from Binghamton University, State University of New York. He has been recognized with an Intel Rising Star Faculty Award, induction into the ISCA and HPCA Halls of Fame, and a Wimmer Faculty Fellowship at Carnegie Mellon University. Several of his works have been winners or finalists for best-of-venue awards, and one was a finalist for the 2022 Intel Hardware Security Academic Award. His current research interests include data-oriented computer architectures and systems, new interfaces between systems software and hardware, energy-efficient memory and storage, and architectures for emerging platforms and domains. For more information, please visit his website at https://ghose.cs.illinois.edu/

17:10 | Virtual Memory Management: The XPU perspective

Speaker: Nastaran Hajinazar, Intel Labs

Abstract

As XPUs (GPUs, DPUs, NPUs, and other accelerators) become first-class compute resources, virtual memory can no longer be treated as a CPU-only concern. This talk provides a whole-system view of virtual memory management spanning CPUs, IOMMUs, device page tables, and hypervisors/VMMs, with a clear primer on PCIe Address Translation Services (ATS) and how it enables devices to participate in shared virtual addressing.

Building on that foundation, we examine why virtual memory for XPUs is poised to “blow up” in complexity: modern deployments increasingly stack multiple translation domains, leading to as many as four layers of address translation along a single access path. We then focus on the next pressure point, PCIe ATS itself. As a decade-plus-old standard that effectively funnels many virtual memory interactions through a limited set of resources in IOMMU, ATS is at risk of becoming a scalability and performance bottleneck. We will discuss why simply adding more translation queues is a tempting but potentially fragile solution, raising questions about sizing, contention, coordination, and system-level scalability. We also will look at alternative approaches to the conventional ATS.

Finally, we argue that invalidation latency will have a first-order impact on the end-to-end performance for XPUs. We outline a research agenda for industry and academia: quantify where invalidation time is actually spent (in the VM subsystem, the VMM, or hardware), identify dominant contributors under realistic workloads, and use that data to guide architectural evolution. This is not a competitive moat, it’s shared infrastructure. We close the talk with a call for broader, cross-industry collaboration to ensure the virtual memory stack scales with the next generation of XPUs.

Short Bio

Dr. Nastaran Hajinazar is a Platform Architect at Intel, where she has worked since 2022 driving innovations in scaling the virtual memory stack for the next generation of XPUs. Before joining Intel, Dr. Hajinazar was a Senior Researcher in the SAFARI Research Group led by Professor Onur Mutlu at ETH Zürich. Her Ph.D. research explored the intersection of data-centric and data-aware computing architectures, introducing a novel data-aware virtual memory framework and a data-centric processing-using-memory framework. Her research spans multiple dimensions of data-centric and data-aware computing systems, hardware-software interfaces, memory hierarchies, and memory management techniques that enable efficient data storage, transfer, and processing. Dr. Hajinazar’s contributions have been published in premier venues such as ISCA, MICRO, and ASPLOS.

17:45 | The Convergence of AI, Supercomputing, and Biology: A View from AMD

Speaker: Gagandeep Singh, AMD

Abstract

Biology is entering a new computational era where advances in AI, supercomputing, and data generation are converging to transform how we understand, model, and engineer living systems. As sequencing throughput accelerates and spatial and multimodal omics expand, biology has become a fundamentally compute-bound discipline. At the same time, foundation models, generative AI, and large-scale simulations are reshaping the scientific workflow, driving a shift from experiment-first discovery to data- and model-driven biological insight.

In this talk, we explore how this transformation is unfolding and why classical HPC and modern AI workloads are now inseparable in life sciences. We will highlight AMD’s technology stack from Instinct GPUs and the ROCm open ecosystem to exascale-class supercomputers such as Frontier, LUMI, and El Capitan, and show how this foundation enables breakthroughs across genomics, molecular simulation, drug discovery, and biological foundation models.

Short Bio

Gagan is a Senior Member of the Technical Staff in AMD's Research and Advanced Development group in Switzerland, working on application acceleration, design space exploration, performance modeling, and is also leading research on healthcare and life sciences. Prior to joining AMD, he was a Postdoctoral Researcher at ETH Zürich in the SAFARI Research Group. He received his Ph.D. from TU Eindhoven in collaboration with IBM Research Zürich in 2021. In 2017, Gagan received a joint M.Sc. degree with distinction in Integrated Circuit Design from TUM, Germany, and NTU, Singapore. Gagan was also an R&D Software Developer at Oracle, India. He is passionate about computer architecture, AI, and bioinformatics.

Organizers

Name	E-mail
Dr. Konstantina Koliogeorgi	konstantina.koliogeorgi@safari.ethz.ch
Dr. Mohammad Sadrosadati	mohammad.sadrosadati@safari.ethz.ch
Rakesh Nadig	rakesh.nadig@safari.ethz.ch
Ataberk Olgun	ataberk.olgun@safari.ethz.ch
Professor Onur Mutlu	onur.mutlu@safari.ethz.ch

Table of Contents

1st SAFARI Conference 2025

Workshop Description

Program at a Glance

Program Details

08:30 | Split Write DRAM: Reducing DRAM Access Latency by Mitigating Read-Write Interference

Speaker: Minesh Patel, Rutgers University

Abstract

Short Bio

09:05 | Making DRAM Available Again: Safely Mitigating RowHammer at Low Performance and Energy Overheads in Modern DRAM-based Systems

Speaker: Abdullah Giray Yaglikci, CISPA

Abstract

Short Bio

09:40 | Co-Designing Operating Systems and Processing-in-Memory for Next-Generation Mobile Devices

Speaker: Yu (Lenny) Liang, Inria Paris

Abstract

Short Bio

Coffee Break

10:45 | Enabling Real-Time Analysis of Human Genomes via New Algorithms and Architectures

Speaker: Can Firtina, University of Maryland, College Park

Abstract

Short Bio

11:20 | Feeding the Beast: The Critical Challenges of Powering Next-Gen AI accelerators

Speaker: Jawad Haj-Yahya, Rivos inc.

Abstract

Short Bio

11:55 | From Microarchitectures to Systems: Designing Holistic Data-Centric Architectures for AI Applications

Speaker: Haiyu Mao, King’s College London

Abstract

Short Bio

Lunch Break

14:00 | Accelerating Access to Data Storage and Services in the Age of AI

Speaker: Juan Gómez Luna, NVIDIA

Abstract

Short Bio

14:35 | Computer Architecture Research in the Galicia Supercomputing Center: Challenges and Opportunities

Speaker: Lois Orosa, Galicia Supercomputing Center (CESGA)

Abstract

Short Bio

15:00 | Building the Machine Learning Software Stack for Processing-In-Memory Architectures

Speaker: Christina Giannoula, Max Planck Institute for Software Systems (MPI-SWS)

Abstract

Short Bio

Coffee Break

16:00 | New Tools, Programming Models, and System Support for Processing-in-Memory Architectures

Speaker: Geraldo F. Oliveira, SAFARI Research Group, ETH Zurich

Abstract

Short Bio

16:35 | Multi-Domain Architectures for Processing-Using-Memory

Speaker: Saugata Ghose, University of Illinois Urbana-Champaign

Abstract

Short Bio

17:10 | Virtual Memory Management: The XPU perspective

Speaker: Nastaran Hajinazar, Intel Labs

Abstract

Short Bio

17:45 | The Convergence of AI, Supercomputing, and Biology: A View from AMD

Speaker: Gagandeep Singh, AMD

Abstract

Short Bio

Organizers