Table of Contents
MICRO 2024 Tutorial on
Memory-Centric Computing Systems (Half Day)
Tutorial Description
Processing-in-Memory (PIM) is a computing paradigm that aims at overcoming the data movement bottleneck (i.e., the waste of execution cycles and energy resulting from the back-and-forth data movement between memory units and compute units) by making memory (and storage) systems compute-capable.
Explored over several decades since the 1960s, PIM systems are now becoming a reality with the advent of the first commercial products and prototypes.
Several startups (e.g., UPMEM, NeuroBlade, Mythic, Syntiant, Aizip, Axelera, d-Matrix, Gyrfalcon Technology, MemComputing, SEMRON, SureCore, Synthara, TetraMem, EnCharge AI) are already commercializing real PIM hardware, each with its design approach and target applications. Major vendors (e.g., Samsung, SK Hynix, Micron, Alibaba) have presented real PIM chip and system prototypes in the past several years.
Recent PIM products and prototypes place compute units near the memory arrays. New memory interfaces like CXL (Compute Express Link) aid the enablement of compute-capable memories. At the same time, academia and industry are actively exploring other types of PIM by, e.g., exploiting the analog operation of DRAM, SRAM, flash memory, and emerging non-volatile memories, and hybrid PIM architectures that combine processing capabilities of different types and at different parts of the memory/storage hierarchy.
PIM can improve performance and energy efficiency for many modern applications, enabling a commercially viable way of dealing with huge amounts of data bottlenecking our computing systems, which is especially exacerbated by workloads like AI/ML and genomics. In fact, workloads like large language model training and inference can potentially be “killer applications'' for PIM.
However, there are many open questions spanning the entire computing stack and many challenges for widespread adoption. For example, it is critical to (1) develop programming frameworks and tools that can lower the learning curve and ease the adoption of PIM systems, (2) develop methods to identify what type of PIM would be useful for what workload, and (3) design system and security mechanisms that enable PIM in a wider scale. Implications of PIM on all aspects of computing systems and workloads is a challenging and exciting field of study.
This tutorial focuses on the latest advances in PIM technology, spanning both hardware and software, including novel PIM ideas, different tools and frameworks to conduct PIM research, and programming techniques and optimization strategies for PIM kernels. We will (1) provide an introduction to PIM and the taxonomy of PIM systems, (2) give an overview and a rigorous analysis of existing PIM hardware from industry and academia, (3) provide and describe hardware and software infrastructures that can enable new and experienced researchers to conduct research in PIM systems, and (4) shed light on how to improve future PIM systems for emerging memory-bound workloads. The tutorial will also incorporate invited talks from leading industry and academic researchers in PIM systems.
Livestream
Organizers
Agenda
Lectures (tentative schedule)
- Introduction: PIM as a paradigm to overcome the data movement bottleneck.
- Workload analysis and system bottlenecks.
- PIM taxonomy: technology, location, and nature of computation (e.g., PNM (processing-near-memory) and PUM (processing-using-memory).
- Advances in different types of PIM at different parts of the memory/storage systems.
- Example real-world PNM systems: UPMEM PIM, Samsung HBM-PIM & CXL-PNM, SK Hynix AiM & CMS 2.0, Samsung AxDIMM, Alibaba PNM, Mythic.
- PUM systems for bulk bitwise operations in simulated and off-the-shelf memory technologies (DRAM, SRAM, and NVM).
- Programming techniques and tools for PIM systems.
- Infrastructures for doing PIM Research (simulation, real systems, FPGA prototypes).
- Research challenges and opportunities in PIM systems, with a focus on enabling adoption in the real world.
Tutorial Materials
Time | Speaker | Title | Materials |
---|---|---|---|
01:00pm-01:30pm | Prof. Onur Mutlu / Geraldo F. Oliveira | Memory-Centric Computing | (PDF) (PPT) |
01:30pm-02:00pm | Geraldo F. Oliveira | Processing-Near-Memory Systems: Academia & Industry Developments | (PDF) (PPT) |
02:00pm-02:30pm | Dr. Brian Schwedock | Architectures and Programming Models for General-Purpose Near-Data Computing | (PDF) (PPT) |
02:30pm-03:00pm | Geraldo F. Oliveira | Processing-Using-Memory Systems for Bulk Bitwise Operations | (PDF) (PPT) |
03:00pm-03:30pm | N/A | Coffee Break | |
03:30pm-04:00pm | Ataberk Olgun | Infrastructure for Processing-Using-Memory Research | (PDF) (PPT) |
04:00pm-04:30pm | Dr. Christina Giannoula | System Software and Libraries for Sparse Computational Kernels in PIM Architectures | (PDF) (PPT) |
04:30pm-05:30pm | Nika Mansouri Ghiasi | Storage-Centric Computing for Genomics and Metagenomics | (PDF) (PPT) |
05:00pm | Geraldo F. Oliveira | Research Challenges for PIM & Closing Remarks | (PDF) (PPT) |
Invited Speakers
Dr. Brian C. Schwedock
Talk Title: Architectures and Programming Models for General-Purpose Near-Data Computing
Talk Abstract: As computer systems are increasingly bottlenecked by data movement, traditional CPU scaling can no longer meet processing demands. To continue improving performance and energy efficiency, novel data-centric architectures move compute closer to data, typically by adding compute resources near data storage. Although these near-data computing (NDC) architectures promise significant gains in performance and energy efficiency, they are often limited by targeting a narrow range of application domains. In this talk, we present two architectures, täkō and Leviathan, that generalize NDC by adding programmable compute resources within the memory hierarchy and providing flexible, easy-to-use programming interfaces. By enabling architectures to implement a wide range of data-centric optimizations, täkō and Leviathan provide a path toward practical NDC.
Bio: Brian Schwedock is an SoC architect at Samsung SARC/ACL. He earned his PhD in Electrical and Computer Engineering at Carnegie Mellon University in 2023. His research tackles the ever-growing data-movement challenge by introducing programmable, data-centric architectures. He currently develops the memory-hierarchy architecture for Samsung’s Exynos SoCs.
Dr. Christina Giannoula
Talk Title: System Software and Libraries for Sparse Computational Kernels in PIM Architectures
Talk Abstract: Processing-In-Memory (PIM) offers a promising solution to alleviate the data movement bottleneck between memory and processors. Several manufacturers have already started to commercialize PIM architectures, providing significant performance and energy improvements for memory-intensive workloads. This talk will explore how specialized libraries and system software can unlock the potential of PIM architectures. I will first present SparseP, the first comprehensive Sparse Matrix Vector Multiplication (SpMV) library for real-world PIM systems. SparseP explores various parallelization strategies, load balancing, and synchronization techniques across thousands of PIM cores, offering insights into performance and energy efficiency benefits. Then, I will briefly introduce PyGim, a novel Graph Neural Network (GNN) library tailored for PIM systems, which optimizes memory-intensive GNN kernels through intelligent parallelization strategies. Our evaluations demonstrate that PyGim provides significant performance and energy improvements over prior state-of-the-art approaches.
Bio: Christina Giannoula received the Ph.D. degree from the School of Electrical and Computer Engineering, National Technical University of Athens, advised by Prof. Georgios Goumas, Prof. Nectarios Koziris, and Prof. Onur Mutlu, in October 2022. She is currently a Postdoctoral Researcher with the University of Toronto working with Prof. Gennady Pekhimenko and his research group. She is also with the SAFARI Research Group and Prof. Onur Mutlu. Her research interests include the intersection of computer architecture, computer systems, and high-performance computing. Specifically, her research focuses on the hardware/software co-design of emerging applications, including graph processing, pointer-chasing data structures, machine learning workloads, and sparse linear algebra, with modern computing paradigms, such as large-scale multicore systems, disaggregated memory systems, and near-data processing architectures. She has several publications and awards for her research on the aforementioned topics. She is a member of ACM, ACM-W, and the Technical Chamber of Greece.
Learning Materials
Recommended Materials
- Mutlu, O., Ghose, S., Gómez-Luna, J., and Ausavarungnirun, R., “A Modern Primer on Processing in Memory.” In Emerging Computing: From Devices to Systems, 2023.
- Gómez-Luna, J., El Hajj, I., Fernandez, I., Giannoula, C., Oliveira, G. F., and Mutlu, O., “Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System.” IEEE Access, 2022.
- Giannoula, C., Fernandez, I., Gómez-Luna, J., Koziris, N., Goumas, G., and Mutlu, O., “SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures,” in SIGMETRICS 2022.
- Olgun, A., Gómez-Luna, J., Kanellopoulos, K., Salami, B., Hassan, H., Ergin, O., and Mutlu, O., “PiDRAM: A Holistic End-to-End FPGA-Based Framework for Processing-in-DRAM.” ACM TACO, 2022.
- Oliveira, G. F., Gómez-Luna, J., Orosa, L., Ghose, S., Vijaykumar, N., Fernandez, I., Sadrosadati, M., Mutlu, O., “DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks.” IEEE Access, 2021.
- Luo, H., Tu, Y. C., Bostancı, F. N., Olgun, A., Ya, A. G., Mutlu, O., “Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator.” IEEE CAL, 2023.
- Olgun, A., Hassan, H., Yağlıkçı, A. G., Tuğrul, Y. C., Orosa, L., Luo, H., Patel, M., Ergin, O., Mutlu, O., “DRAM Bender: An Extensible and Versatile FPGA-Based Infrastructure to Easily Test State-of-the-Art DRAM Chips.” IEEE CAD, 2023.
- Oliveira, G. F., Olgun, A., Yaglikci, A. G., Bostanci, N., Gomez-Luna, J., Ghose, S., Mutlu, O., “MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing,” in HPCA, 2024.
- Hajinazar, N., Oliveira, G. F., Gregorio, S., Ferreira, J. D., Ghiasi, N. M., Patel, M., Alser, M., Ghose, S., Gomez-Luna, J., Mutlu. O., “SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM,” in ASPLOS, 2021.
- Seshadri, V., Lee, D., Mullins, T., Hassan, H., Boroumand, A., Kim, J., Kozuch, M. A., Mutlu, O., Gibbons, P. B., Mowry, T. C., “Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology,” in MICRO, 2017.
- Schwedock, B.C., Yoovidhya, P., Seibert, J. and Beckmann, N., “Täkō: A Polymorphic Cache Hierarchy for General-Purpose Optimization of Data Movement,” in ISCA, 2022.
- Schwedock, B.C. and Beckmann, N., “Leviathan: A Unified System for General-Purpose Near-Data Computing,” in MICRO, 2024.
More Learning Materials
- Mutlu O., Memory-Centric Computing (IMACAW Keynote Talk at DAC 2023), July 2023:
- Processing-in-Memory: A Workload-Driven Perspective (summary paper about recent research in PIM):
- Processing Data Where It Makes Sense: Enabling In-Memory Computation (summary paper about recent research in PIM):
- Processing-in-Memory course (Spring 2022):
- Gómez-Luna, J., and Mutlu, O., Data-Centric Architectures: Fundamentally Improving Performance and Energy (227-0085-37L), ETH Zürich, Fall 2022.