Both sides previous revisionPrevious revisionNext revision | Previous revision |
start [2025/03/25 13:56] – geraldod | start [2025/04/24 21:32] (current) – [Memory-Centric Computing Systems (MCCSys) - March 30th] omutlu |
---|
===== 1st Workshop on ===== | ===== 1st Workshop on ===== |
===== Memory-Centric Computing Systems (MCCSys) - March 30th ===== | ===== Memory-Centric Computing Systems (MCCSys) - 30 March 2025 ===== |
| |
==== Workshop Description ==== | ==== Workshop Description ==== |
| |
^ Time ^ Speaker ^ Title ^ Materials ^ | ^ Time ^ Speaker ^ Title ^ Materials ^ |
| 09:00am | Geraldo F. Oliveira | Logistics |{{|(PDF)}} {{|(PPT)}}| | | 09:00am | Geraldo F. Oliveira | Logistics |{{geraldo-asplos25-lecture0-introduction-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture0-introduction-beforelecture.pptx|(PPT)}}| |
| 09:00am-09:30am | Prof. Onur Mutlu | Memory-Centric Computing Systems |{{|(PDF)}} {{|(PPT)}}| | | 09:00am-09:30am | Prof. Onur Mutlu | Recent Advances in Processing-in-DRAM |{{onur-MCCSys-ASPLOS-MemoryCentricComputing-30-March-2025.pdf|(PDF)}} {{onur-MCCSys-ASPLOS-MemoryCentricComputing-30-March-2025.pptx|(PPT)}}| |
| 09:30am-10:00am | Geraldo F. Oliveira | Processing-Near-Memory Systems: Developments from Academia & Industry | {{|(PDF)}} {{|(PPT)}}| | |
| 10:00am-10:30am | Geraldo F. Oliveira | Programming Processing-Near-Memory Systems |{{|(PDF)}} {{|(PPT)}}| | |
|10:30am-11:00am | N/A | **Coffee Break** | | | |10:30am-11:00am | N/A | **Coffee Break** | | |
| 11:00am-11:30am | Geraldo F. Oliveira | Processing-Using-Memory Systems for Bulk Bitwise Operations | {{|(PDF)}} {{|(PPT)}}| | |11:00am-11:30am | Geraldo F. Oliveira | Processing-Near-Memory Systems: Developments from Academia & Industry | {{geraldo-asplos25-lecture2-processing-near-memory-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture2-processing-near-memory-beforelecture.pptx|(PPT)}}| |
| 11:30am-12:00pm | Dr. Mohammad Sadr | Processing-Near-Storage & Processing-Using-Storage | {{|(PDF)}} {{|(PPT)}}| | | 11:30am-12:00pm | Geraldo F. Oliveira | Processing-Using-Memory Systems for Bulk Bitwise Operations | {{geraldo-asplos25-lecture4-processing-using-memory-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture4-processing-using-memory-beforelecture.pptx|(PPT)}}| |
| 12:00pm-12:30pm | Geraldo F. Oliveira | Infrastructure for PIM Research & Research Challenges | {{|(PDF)}} | | | 12:00am-12:30pm | Dr. Mohammad Sadr | Processing-Near-Storage & Processing-Using-Storage | {{mohammad-mcc-asplos-memorycentriccomputing-30-march-2025.pdf|(PDF)}} {{mohammad-mcc-asplos-memorycentriccomputing-30-march-2025.pptx|(PPT)}}| |
| | 12:30pm | Geraldo F. Oliveira | Infrastructure for PIM Research & Research Challenges | {{geraldo-asplos25-lecture6-adoption-programmability-beforelecture.pdf|(PDF)}}{{geraldo-asplos25-lecture6-adoption-programmability-beforelecture.pptx|(PPT)}} | |
| 12:30pm-02:00pm | N/A | **Lunch Break** | | | | 12:30pm-02:00pm | N/A | **Lunch Break** | | |
| 02:00pm-02:30pm | [[https://cfaed.tu-dresden.de/ccc-staff/hamid-farzaneh|Hamid Farzaneh]] | CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms |{{|(PDF)}} {{|(PPT)}}| | | 02:00pm-02:30pm | [[https://cfaed.tu-dresden.de/ccc-staff/hamid-farzaneh|Hamid Farzaneh]] | CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms |{{hamid_farzaneh.pdf|(PDF)}} {{hamid_farzaneh.pptx|(PPT)}}| |
| 02:30pm-03:00pm | Theocharis Diamantidis | Harnessing PIM Techniques for Accelerating Sum Operations in FPGA-DRAM Architectures| {{|(PDF)}} {{|(PPT)}}| | | 02:30pm-03:00pm | Theocharis Diamantidis | Harnessing PIM Techniques for Accelerating Sum Operations in FPGA-DRAM Architectures| {{theocharis_diamantidis.pdf|(PDF)}} {{theocharis_diamantidis.pptx|(PPT)}}| |
| 03:00pm-03:30pm | Krystian Chmielewski | Pitfalls of UPMEM Kernel Development |{{|(PDF)}} {{|(PPT)}}| | | 03:00pm-03:30pm | Krystian Chmielewski | Pitfalls of UPMEM Kernel Development |{{Pitfalls of UPMEM kernel development.pdf|(PDF)}} {{Pitfalls of UPMEM kernel development.pptx|(PPT)}}| |
| 03:30pm-04:00pm | N/A | **Coffee Break** | | | | 03:30pm-04:00pm | N/A | **Coffee Break** | | |
| 04:00pm-04:30pm | Yintao He | PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | {{|(PDF)}} {{|(PPT)}}| | | 04:00pm-04:30pm | Yintao He | PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | {{PAPI-afterDR2.pdf|(PDF)}} {{PAPI-afterDR2.pptx|(PPT)}}| |
| 04:30pm-05:00pm | [[https://web.eecs.umich.edu/~yufenggu/|Yufeng Gu]] | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | {{|(PDF)}} {{|(PPT)}}| | | 04:30pm-05:00pm | [[https://web.eecs.umich.edu/~yufenggu/|Yufeng Gu]] | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | {{CENT-Talk-20min.pdf|(PDF)}} | |
| 05:00pm-05:30pm | [[https://cgiannoula.github.io/|Dr. Christina Giannoula]] | PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures | {{|(PDF)}} | | | 05:00pm-05:30pm | [[https://cgiannoula.github.io/|Dr. Christina Giannoula]] | PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures | {{PyGIM_PIMWorkshop_ASPLOS25.pdf|(PDF)}} {{PyGIM_PIMWorkshop_ASPLOS25.pptx|(PPT)}}| |
| |
| |
| |
=== Yintao He (UCAS) === | === Yintao He (UCAS) === |
**Talk Title:** PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System {{ ::yintao_headshot.jpg?nolink&200|}} | **Talk Title:** PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System {{ ::yintao_headshot.jpeg?nolink&200|}} |
| |
**Talk Abstract:** Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens. Several prior works focus on improving the performance of LLM decoding using parallelism techniques, such as batching and speculative decoding. State-of-the-art LLM decoding has both compute-bound and memory-bound kernels. Some prior works statically identify and map these different kernels to a heterogeneous architecture consisting of both processing-in-memory (PIM) units and computation-centric accelerators. We observe that characteristics of LLM decoding kernels (e.g., whether or not a kernel is memory-bound) can change dynamically due to parameter changes to meet user and/or system demands, making (1) static kernel mapping to PIM units and computation-centric accelerators suboptimal, and (2) one-size-fits-all approach of designing PIM units inefficient due to a large degree of heterogeneity even in memory-bound kernels. | **Talk Abstract:** Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens. Several prior works focus on improving the performance of LLM decoding using parallelism techniques, such as batching and speculative decoding. State-of-the-art LLM decoding has both compute-bound and memory-bound kernels. Some prior works statically identify and map these different kernels to a heterogeneous architecture consisting of both processing-in-memory (PIM) units and computation-centric accelerators. We observe that characteristics of LLM decoding kernels (e.g., whether or not a kernel is memory-bound) can change dynamically due to parameter changes to meet user and/or system demands, making (1) static kernel mapping to PIM units and computation-centric accelerators suboptimal, and (2) one-size-fits-all approach of designing PIM units inefficient due to a large degree of heterogeneity even in memory-bound kernels. |