User Tools

Site Tools


Real-world Processing-in-Memory Systems for Modern Workloads

Tutorial Description

Processing-in-Memory (PIM) is a computing paradigm that aims at overcoming the data movement bottleneck (i.e., the waste of execution cycles and energy resulting from the back-and-forth data movement between memory units and compute units) by making memory compute-capable.

Explored over several decades since the 1960s, PIM systems are becoming a reality with the advent of the first commercial products and prototypes.

A number of startups (e.g., UPMEM, Neuroblade) are already commercializing real PIM hardware, each with its own design approach and target applications. Several major vendors (e.g., Samsung, SK Hynix, Alibaba) have presented real PIM chip prototypes in the last two years. Most of these architectures have in common that they place compute units near the memory arrays. This type of PIM is called processing near memory (PNM).

PIM can provide large improvements in both performance and energy consumption for many modern applications, thereby enabling a commercially viable way of dealing with huge amounts of data that is bottlenecking our computing systems. Yet, it is critical to (1) study and understand the characteristics that make a workload suitable for a PIM architecture, (2) propose optimization strategies for PIM kernels, and (3) develop programming frameworks and tools that can lower the learning curve and ease the adoption of PIM.

This tutorial focuses on the latest advances in PIM technology, workload characterization for PIM, and programming and optimizing PIM kernels. We will (1) provide an introduction to PIM and taxonomy of PIM systems, (2) give an overview and a rigorous analysis of existing real-world PIM hardware, (3) conduct hand-on labs about important workloads (machine learning, sparse linear algebra, bioinformatics, etc.) using real PIM systems, and (4) shed light on how to improve future PIM systems for such workloads.


YouTube livestream


Agenda (March 26, 2023)

Lectures (tentative)

  • 9:00am-10:20am, Prof. Onur Mutlu, “Memory-centric computing: Introduction to PIM as a paradigm to overcome the data movement bottleneck”.
    • PIM taxonomy: PNM (processing near memory) and PUM (processing using memory).
  • Coffee break (10:20am-10:40am)
  • 10:40am-12:00am, Dr. Juan Gómez Luna, “Processing-Near-Memory: Real PNM Architectures Programming General-purpose PIM”.
    • PNM prototypes: Samsung HBM-PIM, SK Hynix AiM, Samsung AxDIMM.
    • UPMEM PIM: Architecture and Programming.
  • Lunch break (12:00pm-1:40pm)
  • 1:40pm-2:20pm, Prof. Alexandra (Sasha) Fedorova (UBC), “Processing in Memory in the Wild”.
  • 2:20pm-3:20pm, Dr. Juan Gómez Luna & Ataberk Olgun, “Processing-Using-Memory: Exploiting the Analog Operational Properties of Memory Components”.
  • Coffee break (3:20pm-3:40pm)
  • 3:40pm-4:10pm, Dr. Juan Gómez Luna, “Adoption issues: How to enable PIM?”
  • 4:10pm-4:50pm, Dr. Yongkee Kwon & Eddy (Chanwook) Park (SK Hynix), “System Architecture and Software Stack for GDDR6-AiM”.
  • 4:50pm-5:00pm, Dr. Juan Gómez Luna, “Introduction/preparation for hands-on lab”.
  • Hands-on Lab (Optional)
    • Microbenchmarking of UPMEM PIM.
    • Accelerating Real-world Workloads with UPMEM PIM.

Tutorial Materials

Time Speaker Title Materials
9:00am-10:20am Prof. Onur Mutlu Memory-Centric Computing (PDF) (PPT)
10:40am-12:00pm Dr. Juan Gómez Luna Processing-Near-Memory: Real PNM Architectures Programming General-purpose PIM (PDF) (PPT)
1:40pm-2:20pm Prof. Alexandra (Sasha) Fedorova (UBC) Processing in Memory in the Wild (PDF) (PPT)
2:20pm-3:20pm Dr. Juan Gómez Luna & Ataberk Olgun Processing-Using-Memory: Exploiting the Analog Operational Properties of Memory Components (PDF) (PPT)
3:40pm-4:10pm Dr. Juan Gómez Luna Adoption issues: How to enable PIM?
Accelerating Modern Workloads on a General-purpose PIM System
4:10pm-4:50pm Dr. Yongkee Kwon & Eddy (Chanwook) Park (SK Hynix) System Architecture and Software Stack for GDDR6-AiM (PDF) (PPT)
4:50pm-5:00pm Dr. Juan Gómez Luna Hands-on Lab: Programming and Understanding a Real Processing-in-Memory Architecture (Handout)

Learning Materials

  • Gómez-Luna, J., and Mutlu, O., Data-Centric Architectures: Fundamentally Improving Performance and Energy (227-0085-37L), ETH Zürich, Fall 2022.
  • Mutlu, O., Ghose, S., Gómez-Luna, J., and Ausavarungnirun, R. A Modern Primer on Processing in Memory. In Emerging Computing: From Devices to Systems, 2023.
  • Gómez-Luna, J., El Hajj, I., Fernandez, I., Giannoula, C., Oliveira, G. F., and Mutlu, O. (2022). Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System. IEEE Access, 2022.
  • Giannoula, C., Fernandez, I., Gómez-Luna, J., Koziris, N., Goumas, G., and Mutlu, O. SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures. SIGMETRICS 2022.
  • Olgun, A., Gómez-Luna, J., Kanellopoulos, K., Salami, B., Hassan, H., Ergin, O., and Mutlu, O. PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM. ACM TACO, 2022.

More Learning Materials


When you register for ASPLOS 2023, make sure that you mark this tutorial in the registration form.

start.txt · Last modified: 2023/08/16 14:17 by ewent

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki