Differences

This shows you the differences between two versions of the page.

--- start [2025/03/25 10:50] – geraldod
+++ start [2025/04/24 21:32] (current) – [Memory-Centric Computing Systems (MCCSys) - March 30th] omutlu
@@ Line 1: / Line 1: @@
 ===== 1st Workshop on =====
-===== Memory-Centric Computing Systems (MCCSys) - March 30th  =====
+===== Memory-Centric Computing Systems (MCCSys) - 30 March 2025  =====
 ==== Workshop Description ====
@@ Line 37: / Line 37: @@
 |[[https://people.inf.ethz.ch/omutlu/index.html|Professor Onur Mutlu]]| <onur.mutlu@safari.ethz.ch> |
-===== Agenda =====
+===== Agenda & Workshop Materials =====
-==== Workshop Materials ====
 ^ Time ^ Speaker ^ Title ^ Materials ^
-| 09:00am   | Geraldo F. Oliveira | Logistics |{{|(PDF)}} {{|(PPT)}}|
+| 09:00am   | Geraldo F. Oliveira | Logistics |{{geraldo-asplos25-lecture0-introduction-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture0-introduction-beforelecture.pptx|(PPT)}}|
-| 09:00am-09:30am   | Prof. Onur Mutlu | Memory-Centric Computing Systems |{{|(PDF)}} {{|(PPT)}}|
+| 09:00am-09:30am   | Prof. Onur Mutlu | Recent Advances in Processing-in-DRAM |{{onur-MCCSys-ASPLOS-MemoryCentricComputing-30-March-2025.pdf|(PDF)}} {{onur-MCCSys-ASPLOS-MemoryCentricComputing-30-March-2025.pptx|(PPT)}}|
-| 09:30am-10:00am   | Geraldo F. Oliveira | Processing-Near-Memory Systems: Developments from Academia & Industry | {{|(PDF)}} {{|(PPT)}}|
+|10:30am-11:00am   | N/A | **Coffee Break** | |
-| 10:00am-10:30am   | Geraldo F. Oliveira | Programming Processing-Near-Memory Systems |{{|(PDF)}} {{|(PPT)}}|
+|11:00am-11:30am   | Geraldo F. Oliveira | Processing-Near-Memory Systems: Developments from Academia & Industry | {{geraldo-asplos25-lecture2-processing-near-memory-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture2-processing-near-memory-beforelecture.pptx|(PPT)}}|
-| 10:30am-11:00am   | N/A | Coffee Break | |
+| 11:30am-12:00pm   | Geraldo F. Oliveira | Processing-Using-Memory Systems for Bulk Bitwise Operations | {{geraldo-asplos25-lecture4-processing-using-memory-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture4-processing-using-memory-beforelecture.pptx|(PPT)}}|
-| 11:00am-11:30am   | Geraldo F. Oliveira | Processing-Using-Memory Systems for Bulk Bitwise Operations | {{|(PDF)}} {{|(PPT)}}|
+| 12:00am-12:30pm   | Dr. Mohammad Sadr | Processing-Near-Storage & Processing-Using-Storage | {{mohammad-mcc-asplos-memorycentriccomputing-30-march-2025.pdf|(PDF)}} {{mohammad-mcc-asplos-memorycentriccomputing-30-march-2025.pptx|(PPT)}}|
-| 11:30am-12:00pm   | Dr, Mohammad Sadr | Processing-Near-Storage & Processing-Using-Storage | {{|(PDF)}} {{|(PPT)}}|
+| 12:30pm  | Geraldo F. Oliveira | Infrastructure for PIM Research & Research Challenges | {{geraldo-asplos25-lecture6-adoption-programmability-beforelecture.pdf|(PDF)}}{{geraldo-asplos25-lecture6-adoption-programmability-beforelecture.pptx|(PPT)}} |
-| 12:00pm-12:30pm  | Geraldo F. Oliveira | Infrastructure for PIM Research & Research Challenges | {{|(PDF)}} |
+| 12:30pm-02:00pm   | N/A | **Lunch Break** | |
-| 12:30pm-02:00pm   | N/A | Lunch Break | |
+| 02:00pm-02:30pm   | [[https://cfaed.tu-dresden.de/ccc-staff/hamid-farzaneh|Hamid Farzaneh]]  | CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms |{{hamid_farzaneh.pdf|(PDF)}} {{hamid_farzaneh.pptx|(PPT)}}|
-| 02:00pm-02:30pm   | Hamid Farzaneh | CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms |{{|(PDF)}} {{|(PPT)}}|
+| 02:30pm-03:00pm   | Theocharis Diamantidis | Harnessing PIM Techniques for Accelerating Sum Operations in FPGA-DRAM Architectures| {{theocharis_diamantidis.pdf|(PDF)}} {{theocharis_diamantidis.pptx|(PPT)}}|
-| 02:30pm-03:00pm   | Theocharis Diamantidis | Harnessing PIM Techniques for Accelerating Sum Operations in FPGA-DRAM Architectures| {{|(PDF)}} {{|(PPT)}}|
+| 03:00pm-03:30pm   | Krystian Chmielewski | Pitfalls of UPMEM Kernel Development |{{Pitfalls of UPMEM kernel development.pdf|(PDF)}} {{Pitfalls of UPMEM kernel development.pptx|(PPT)}}|
-| 03:00pm-03:30pm   | Krystian Chmielewski | Pitfalls of UPMEM Kernel Development |{{|(PDF)}} {{|(PPT)}}|
+| 03:30pm-04:00pm   | N/A | **Coffee Break** | |
-| 03:30pm-04:00pm   | N/A | Coffee Break | |
+| 04:00pm-04:30pm   | Yintao He | PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | {{PAPI-afterDR2.pdf|(PDF)}} {{PAPI-afterDR2.pptx|(PPT)}}|
-| 04:00pm-04:30pm   | TBD | TBD | {{|(PDF)}} {{|(PPT)}}|
+| 04:30pm-05:00pm   | [[https://web.eecs.umich.edu/~yufenggu/|Yufeng Gu]]  | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | {{CENT-Talk-20min.pdf|(PDF)}} |
-| 04:30pm-05:00pm   | Yufeng Gu | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | {{|(PDF)}} {{|(PPT)}}|
+| 05:00pm-05:30pm   | [[https://cgiannoula.github.io/|Dr. Christina Giannoula]]  | PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures | {{PyGIM_PIMWorkshop_ASPLOS25.pdf|(PDF)}} {{PyGIM_PIMWorkshop_ASPLOS25.pptx|(PPT)}}|
-| 05:00pm-05:30pm   | Dr. Christina Giannoula | PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures | {{|(PDF)}} |
@@ Line 85: / Line 82: @@
 **Bio:** Krystian Chmielewski is a software engineer with 8 years of experience in emerging computing architectures and low-level performance optimizations. Since 2023, he has been working at Huawei Warsaw Research Center in Poland focusing on the enablement of novel Processing-In-Memory architectures and optimizing the JVM's just-in-time compiler. Prior to this, Krystian spent 6 years at Intel, where he specialized in compute runtimes and worked on features such as Mutable Command Lists.
+=== Yintao He (UCAS) ===
+**Talk Title:** PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System {{ ::yintao_headshot.jpeg?nolink&200|}}
+**Talk Abstract:** Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens. Several prior works focus on improving the performance of LLM decoding using parallelism techniques, such as batching and speculative decoding. State-of-the-art LLM decoding has both compute-bound and memory-bound kernels. Some prior works statically identify and map these different kernels to a heterogeneous architecture consisting of both processing-in-memory (PIM) units and computation-centric accelerators. We observe that characteristics of LLM decoding kernels (e.g., whether or not a kernel is memory-bound) can change dynamically due to parameter changes to meet user and/or system demands, making (1) static kernel mapping to PIM units and computation-centric accelerators suboptimal, and (2) one-size-fits-all approach of designing PIM units inefficient due to a large degree of heterogeneity even in memory-bound kernels.
+In this paper, we aim to accelerate LLM decoding while considering the dynamically changing characteristics of the kernels involved. We propose PAPI (PArallel Decoding with PIM), a PIM-enabled heterogeneous architecture that exploits dynamic scheduling of compute-bound or memory-bound kernels to suitable hardware units. PAPI has two key mechanisms: (1) online kernel characterization to dynamically schedule kernels to the most suitable hardware units at runtime and (2) a PIM-enabled heterogeneous computing system that harmoniously orchestrates both computation-centric processing units and hybrid PIM units with different computing capabilities. Our experimental results on three broadly-used LLMs show that PAPI achieves 1.8× and 11.1× speedups over a state-of-the-art heterogeneous LLM accelerator and a state-of-the-art PIM-only LLM accelerator, respectively.
+**Bio:** Yintao He received the BE degree in electronic science and technology from Nankai University, Tianjin, China, in 2019. She is currently working toward the PhD degree with the University of Chinese Academy of Sciences, Beijing, China. Her research interests include processing in-memory and energy-efficient accelerators.