Differences

This shows you the differences between two versions of the page.

--- start [2025/03/25 13:56] – geraldod
+++ start [2025/04/24 21:32] (current) – [Memory-Centric Computing Systems (MCCSys) - March 30th] omutlu
@@ Line 1: / Line 1: @@
 ===== 1st Workshop on =====
-===== Memory-Centric Computing Systems (MCCSys) - March 30th  =====
+===== Memory-Centric Computing Systems (MCCSys) - 30 March 2025  =====
 ==== Workshop Description ====
@@ Line 40: / Line 40: @@
 ^ Time ^ Speaker ^ Title ^ Materials ^
-| 09:00am   | Geraldo F. Oliveira | Logistics |{{|(PDF)}} {{|(PPT)}}|
+| 09:00am   | Geraldo F. Oliveira | Logistics |{{geraldo-asplos25-lecture0-introduction-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture0-introduction-beforelecture.pptx|(PPT)}}|
-| 09:00am-09:30am   | Prof. Onur Mutlu | Memory-Centric Computing Systems |{{|(PDF)}} {{|(PPT)}}|
+| 09:00am-09:30am   | Prof. Onur Mutlu | Recent Advances in Processing-in-DRAM |{{onur-MCCSys-ASPLOS-MemoryCentricComputing-30-March-2025.pdf|(PDF)}} {{onur-MCCSys-ASPLOS-MemoryCentricComputing-30-March-2025.pptx|(PPT)}}|
-| 09:30am-10:00am   | Geraldo F. Oliveira | Processing-Near-Memory Systems: Developments from Academia & Industry | {{|(PDF)}} {{|(PPT)}}|
-| 10:00am-10:30am   | Geraldo F. Oliveira | Programming Processing-Near-Memory Systems |{{|(PDF)}} {{|(PPT)}}|
 |10:30am-11:00am   | N/A | **Coffee Break** | |
-| 11:00am-11:30am   | Geraldo F. Oliveira | Processing-Using-Memory Systems for Bulk Bitwise Operations | {{|(PDF)}} {{|(PPT)}}|
+|11:00am-11:30am   | Geraldo F. Oliveira | Processing-Near-Memory Systems: Developments from Academia & Industry | {{geraldo-asplos25-lecture2-processing-near-memory-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture2-processing-near-memory-beforelecture.pptx|(PPT)}}|
-| 11:30am-12:00pm   | Dr. Mohammad Sadr | Processing-Near-Storage & Processing-Using-Storage | {{|(PDF)}} {{|(PPT)}}|
+| 11:30am-12:00pm   | Geraldo F. Oliveira | Processing-Using-Memory Systems for Bulk Bitwise Operations | {{geraldo-asplos25-lecture4-processing-using-memory-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture4-processing-using-memory-beforelecture.pptx|(PPT)}}|
-| 12:00pm-12:30pm  | Geraldo F. Oliveira | Infrastructure for PIM Research & Research Challenges | {{|(PDF)}} |
+| 12:00am-12:30pm   | Dr. Mohammad Sadr | Processing-Near-Storage & Processing-Using-Storage | {{mohammad-mcc-asplos-memorycentriccomputing-30-march-2025.pdf|(PDF)}} {{mohammad-mcc-asplos-memorycentriccomputing-30-march-2025.pptx|(PPT)}}|
+| 12:30pm  | Geraldo F. Oliveira | Infrastructure for PIM Research & Research Challenges | {{geraldo-asplos25-lecture6-adoption-programmability-beforelecture.pdf|(PDF)}}{{geraldo-asplos25-lecture6-adoption-programmability-beforelecture.pptx|(PPT)}} |
 | 12:30pm-02:00pm   | N/A | **Lunch Break** | |
-| 02:00pm-02:30pm   | [[https://cfaed.tu-dresden.de/ccc-staff/hamid-farzaneh|Hamid Farzaneh]]  | CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms |{{|(PDF)}} {{|(PPT)}}|
+| 02:00pm-02:30pm   | [[https://cfaed.tu-dresden.de/ccc-staff/hamid-farzaneh|Hamid Farzaneh]]  | CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms |{{hamid_farzaneh.pdf|(PDF)}} {{hamid_farzaneh.pptx|(PPT)}}|
-| 02:30pm-03:00pm   | Theocharis Diamantidis | Harnessing PIM Techniques for Accelerating Sum Operations in FPGA-DRAM Architectures| {{|(PDF)}} {{|(PPT)}}|
+| 02:30pm-03:00pm   | Theocharis Diamantidis | Harnessing PIM Techniques for Accelerating Sum Operations in FPGA-DRAM Architectures| {{theocharis_diamantidis.pdf|(PDF)}} {{theocharis_diamantidis.pptx|(PPT)}}|
-| 03:00pm-03:30pm   | Krystian Chmielewski | Pitfalls of UPMEM Kernel Development |{{|(PDF)}} {{|(PPT)}}|
+| 03:00pm-03:30pm   | Krystian Chmielewski | Pitfalls of UPMEM Kernel Development |{{Pitfalls of UPMEM kernel development.pdf|(PDF)}} {{Pitfalls of UPMEM kernel development.pptx|(PPT)}}|
 | 03:30pm-04:00pm   | N/A | **Coffee Break** | |
-| 04:00pm-04:30pm   | Yintao He | PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | {{|(PDF)}} {{|(PPT)}}|
+| 04:00pm-04:30pm   | Yintao He | PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | {{PAPI-afterDR2.pdf|(PDF)}} {{PAPI-afterDR2.pptx|(PPT)}}|
-| 04:30pm-05:00pm   | [[https://web.eecs.umich.edu/~yufenggu/|Yufeng Gu]]  | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | {{|(PDF)}} {{|(PPT)}}|
+| 04:30pm-05:00pm   | [[https://web.eecs.umich.edu/~yufenggu/|Yufeng Gu]]  | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | {{CENT-Talk-20min.pdf|(PDF)}} |
-| 05:00pm-05:30pm   | [[https://cgiannoula.github.io/|Dr. Christina Giannoula]]  | PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures | {{|(PDF)}} |
+| 05:00pm-05:30pm   | [[https://cgiannoula.github.io/|Dr. Christina Giannoula]]  | PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures | {{PyGIM_PIMWorkshop_ASPLOS25.pdf|(PDF)}} {{PyGIM_PIMWorkshop_ASPLOS25.pptx|(PPT)}}|
@@ Line 85: / Line 84: @@
 === Yintao He (UCAS) ===
-**Talk Title:** PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System {{ ::yintao_headshot.jpg?nolink&200|}}
+**Talk Title:** PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System {{ ::yintao_headshot.jpeg?nolink&200|}}
 **Talk Abstract:** Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens. Several prior works focus on improving the performance of LLM decoding using parallelism techniques, such as batching and speculative decoding. State-of-the-art LLM decoding has both compute-bound and memory-bound kernels. Some prior works statically identify and map these different kernels to a heterogeneous architecture consisting of both processing-in-memory (PIM) units and computation-centric accelerators. We observe that characteristics of LLM decoding kernels (e.g., whether or not a kernel is memory-bound) can change dynamically due to parameter changes to meet user and/or system demands, making (1) static kernel mapping to PIM units and computation-centric accelerators suboptimal, and (2) one-size-fits-all approach of designing PIM units inefficient due to a large degree of heterogeneity even in memory-bound kernels.