User Tools

Site Tools


start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
start [2025/03/25 13:56] geraldodstart [2025/04/24 21:32] (current) – [Memory-Centric Computing Systems (MCCSys) - March 30th] omutlu
Line 1: Line 1:
 ===== 1st Workshop on =====  ===== 1st Workshop on ===== 
-===== Memory-Centric Computing Systems (MCCSys) - March 30th  =====+===== Memory-Centric Computing Systems (MCCSys) - 30 March 2025  =====
  
 ==== Workshop Description ==== ==== Workshop Description ====
Line 40: Line 40:
  
 ^ Time ^ Speaker ^ Title ^ Materials ^ ^ Time ^ Speaker ^ Title ^ Materials ^
-| 09:00am   | Geraldo F. Oliveira | Logistics |{{|(PDF)}} {{|(PPT)}}| +| 09:00am   | Geraldo F. Oliveira | Logistics |{{geraldo-asplos25-lecture0-introduction-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture0-introduction-beforelecture.pptx|(PPT)}}| 
-| 09:00am-09:30am   | Prof. Onur Mutlu | Memory-Centric Computing Systems |{{|(PDF)}} {{|(PPT)}}| +| 09:00am-09:30am   | Prof. Onur Mutlu | Recent Advances in Processing-in-DRAM |{{onur-MCCSys-ASPLOS-MemoryCentricComputing-30-March-2025.pdf|(PDF)}} {{onur-MCCSys-ASPLOS-MemoryCentricComputing-30-March-2025.pptx|(PPT)}}|
-| 09:30am-10:00am   | Geraldo F. Oliveira | Processing-Near-Memory Systems: Developments from Academia & Industry | {{|(PDF)}} {{|(PPT)}}| +
-| 10:00am-10:30am   | Geraldo F. Oliveira | Programming Processing-Near-Memory Systems |{{|(PDF)}} {{|(PPT)}}|+
 |10:30am-11:00am   | N/A | **Coffee Break** | | |10:30am-11:00am   | N/A | **Coffee Break** | |
-| 11:00am-11:30am   | Geraldo F. Oliveira | Processing-Using-Memory Systems for Bulk Bitwise Operations | {{|(PDF)}} {{|(PPT)}}| +|11:00am-11:30am   | Geraldo F. Oliveira | Processing-Near-Memory Systems: Developments from Academia & Industry | {{geraldo-asplos25-lecture2-processing-near-memory-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture2-processing-near-memory-beforelecture.pptx|(PPT)}}| 
-11:30am-12:00pm   | Dr. Mohammad Sadr | Processing-Near-Storage & Processing-Using-Storage | {{|(PDF)}} {{|(PPT)}}| +| 11:30am-12:00pm   | Geraldo F. Oliveira | Processing-Using-Memory Systems for Bulk Bitwise Operations | {{geraldo-asplos25-lecture4-processing-using-memory-beforelecture.pdf|(PDF)}} {{geraldo-asplos25-lecture4-processing-using-memory-beforelecture.pptx|(PPT)}}| 
-12:00pm-12:30pm  | Geraldo F. Oliveira | Infrastructure for PIM Research & Research Challenges | {{|(PDF)}} |+12:00am-12:30pm   | Dr. Mohammad Sadr | Processing-Near-Storage & Processing-Using-Storage | {{mohammad-mcc-asplos-memorycentriccomputing-30-march-2025.pdf|(PDF)}} {{mohammad-mcc-asplos-memorycentriccomputing-30-march-2025.pptx|(PPT)}}| 
 +| 12:30pm  | Geraldo F. Oliveira | Infrastructure for PIM Research & Research Challenges | {{geraldo-asplos25-lecture6-adoption-programmability-beforelecture.pdf|(PDF)}}{{geraldo-asplos25-lecture6-adoption-programmability-beforelecture.pptx|(PPT)}} |
 | 12:30pm-02:00pm   | N/A | **Lunch Break** | | | 12:30pm-02:00pm   | N/A | **Lunch Break** | |
-| 02:00pm-02:30pm   | [[https://cfaed.tu-dresden.de/ccc-staff/hamid-farzaneh|Hamid Farzaneh]]  | CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms |{{|(PDF)}} {{|(PPT)}}| +| 02:00pm-02:30pm   | [[https://cfaed.tu-dresden.de/ccc-staff/hamid-farzaneh|Hamid Farzaneh]]  | CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms |{{hamid_farzaneh.pdf|(PDF)}} {{hamid_farzaneh.pptx|(PPT)}}| 
-| 02:30pm-03:00pm   | Theocharis Diamantidis | Harnessing PIM Techniques for Accelerating Sum Operations in FPGA-DRAM Architectures| {{|(PDF)}} {{|(PPT)}}| +| 02:30pm-03:00pm   | Theocharis Diamantidis | Harnessing PIM Techniques for Accelerating Sum Operations in FPGA-DRAM Architectures| {{theocharis_diamantidis.pdf|(PDF)}} {{theocharis_diamantidis.pptx|(PPT)}}| 
-| 03:00pm-03:30pm   | Krystian Chmielewski | Pitfalls of UPMEM Kernel Development |{{|(PDF)}} {{|(PPT)}}|+| 03:00pm-03:30pm   | Krystian Chmielewski | Pitfalls of UPMEM Kernel Development |{{Pitfalls of UPMEM kernel development.pdf|(PDF)}} {{Pitfalls of UPMEM kernel development.pptx|(PPT)}}|
 | 03:30pm-04:00pm   | N/A | **Coffee Break** | | | 03:30pm-04:00pm   | N/A | **Coffee Break** | |
-| 04:00pm-04:30pm   | Yintao He | PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | {{|(PDF)}} {{|(PPT)}}| +| 04:00pm-04:30pm   | Yintao He | PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | {{PAPI-afterDR2.pdf|(PDF)}} {{PAPI-afterDR2.pptx|(PPT)}}| 
-| 04:30pm-05:00pm   | [[https://web.eecs.umich.edu/~yufenggu/|Yufeng Gu]]  | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | {{|(PDF)}} {{|(PPT)}}| +| 04:30pm-05:00pm   | [[https://web.eecs.umich.edu/~yufenggu/|Yufeng Gu]]  | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | {{CENT-Talk-20min.pdf|(PDF)}} | 
-| 05:00pm-05:30pm   | [[https://cgiannoula.github.io/|Dr. Christina Giannoula]]  | PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures | {{|(PDF)}} |+| 05:00pm-05:30pm   | [[https://cgiannoula.github.io/|Dr. Christina Giannoula]]  | PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures | {{PyGIM_PIMWorkshop_ASPLOS25.pdf|(PDF)}} {{PyGIM_PIMWorkshop_ASPLOS25.pptx|(PPT)}}|
  
  
Line 85: Line 84:
  
 === Yintao He (UCAS) ===   === Yintao He (UCAS) ===  
-**Talk Title:** PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System {{ ::yintao_headshot.jpg?nolink&200|}}+**Talk Title:** PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System {{ ::yintao_headshot.jpeg?nolink&200|}}
  
 **Talk Abstract:** Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens. Several prior works focus on improving the performance of LLM decoding using parallelism techniques, such as batching and speculative decoding. State-of-the-art LLM decoding has both compute-bound and memory-bound kernels. Some prior works statically identify and map these different kernels to a heterogeneous architecture consisting of both processing-in-memory (PIM) units and computation-centric accelerators. We observe that characteristics of LLM decoding kernels (e.g., whether or not a kernel is memory-bound) can change dynamically due to parameter changes to meet user and/or system demands, making (1) static kernel mapping to PIM units and computation-centric accelerators suboptimal, and (2) one-size-fits-all approach of designing PIM units inefficient due to a large degree of heterogeneity even in memory-bound kernels. **Talk Abstract:** Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens. Several prior works focus on improving the performance of LLM decoding using parallelism techniques, such as batching and speculative decoding. State-of-the-art LLM decoding has both compute-bound and memory-bound kernels. Some prior works statically identify and map these different kernels to a heterogeneous architecture consisting of both processing-in-memory (PIM) units and computation-centric accelerators. We observe that characteristics of LLM decoding kernels (e.g., whether or not a kernel is memory-bound) can change dynamically due to parameter changes to meet user and/or system demands, making (1) static kernel mapping to PIM units and computation-centric accelerators suboptimal, and (2) one-size-fits-all approach of designing PIM units inefficient due to a large degree of heterogeneity even in memory-bound kernels.
start.1742911002.txt.gz · Last modified: 2025/03/25 13:56 by geraldod

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki