start
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
start [2025/06/07 23:29] – geraldod | start [2025/06/08 15:38] (current) – [Agenda & Workshop Materials (Tentative)] kkoliogeorgi | ||
---|---|---|---|
Line 46: | Line 46: | ||
| 10:00 AM | Dr. Geraldo F. Oliveira | Processing-Using-Memory (PUM) Systems - Part I | {{geraldo-ics25-lecture2-PUM-part-I-beforelecture.pdf|(PDF)}} {{geraldo-ics25-lecture2-PUM-part-I-beforelecture.pptx|(PPT)}} | | | 10:00 AM | Dr. Geraldo F. Oliveira | Processing-Using-Memory (PUM) Systems - Part I | {{geraldo-ics25-lecture2-PUM-part-I-beforelecture.pdf|(PDF)}} {{geraldo-ics25-lecture2-PUM-part-I-beforelecture.pptx|(PPT)}} | | ||
| 10:30 AM | N/A | **Coffee Break** | | | | 10:30 AM | N/A | **Coffee Break** | | | ||
- | | 10:45 AM | Ismail E. Yuksel | Functionally-Complete Boolean Logic in Real DRAM Chips | | | + | | 10:45 AM | Ismail E. Yuksel | Functionally-Complete Boolean Logic in Real DRAM Chips | {{ics25_mccsys_fcdram_ismail_talk.pdf|(PDF)}} {{ics25_mccsys_fcdram_ismail_talk.pptx|(PPT)}} |
| 11:15 AM | Dr. Geraldo F. Oliveira | Processing-Using-Memory (PUM) Systems - Part II | {{geraldo-ics25-lecture3-PUM-part-II-beforelecture.pdf|(PDF)}} {{geraldo-ics25-lecture3-PUM-part-II-beforelecture.pptx|(PPT)}} | | | 11:15 AM | Dr. Geraldo F. Oliveira | Processing-Using-Memory (PUM) Systems - Part II | {{geraldo-ics25-lecture3-PUM-part-II-beforelecture.pdf|(PDF)}} {{geraldo-ics25-lecture3-PUM-part-II-beforelecture.pptx|(PPT)}} | | ||
| 11:45 AM | Dr. Geraldo F. Oliveira | Processing-Near-Memory (PNM) Systems: Academia & Industry Developments - Part I | {{geraldo-ics25-lecture4-PNM-part-I-beforelecture.pdf|(PDF)}} {{geraldo-ics25-lecture4-PNM-part-I-beforelecture.pptx|(PPT)}} | | | 11:45 AM | Dr. Geraldo F. Oliveira | Processing-Near-Memory (PNM) Systems: Academia & Industry Developments - Part I | {{geraldo-ics25-lecture4-PNM-part-I-beforelecture.pdf|(PDF)}} {{geraldo-ics25-lecture4-PNM-part-I-beforelecture.pptx|(PPT)}} | | ||
| 12:00 PM | N/A | **Lunch** | | | | 12:00 PM | N/A | **Lunch** | | | ||
| 01:00 PM | Dr. Geraldo F. Oliveira |Processing-Near-Memory (PNM) Systems: Academia & Industry Developments - Part II | {{geraldo-ics25-lecture5-PNM-part-II-beforelecture.pdf|(PDF)}} {{geraldo-ics25-lecture5-PNM-part-II-beforelecture.pptx|(PPT)}} | | | 01:00 PM | Dr. Geraldo F. Oliveira |Processing-Near-Memory (PNM) Systems: Academia & Industry Developments - Part II | {{geraldo-ics25-lecture5-PNM-part-II-beforelecture.pdf|(PDF)}} {{geraldo-ics25-lecture5-PNM-part-II-beforelecture.pptx|(PPT)}} | | ||
- | | 01:30 PM | Dr. Konstantina Koliogeorgi | PIM Architectures for Bioinformatics | + | | 01:30 PM | Dr. Konstantina Koliogeorgi | PIM Architectures for Bioinformatics |
| 02:00 PM | Dr. Geraldo F. Oliveira | PIM Adoption & Programmability | {{geraldo-ics25-lecture6-adoption-beforelecture.pdf|(PDF)}} {{geraldo-ics25-lecture6-adoption-beforelecture.pptx|(PPT)}} | | | 02:00 PM | Dr. Geraldo F. Oliveira | PIM Adoption & Programmability | {{geraldo-ics25-lecture6-adoption-beforelecture.pdf|(PDF)}} {{geraldo-ics25-lecture6-adoption-beforelecture.pptx|(PPT)}} | | ||
- | | 02:30 PM | Dr. Geraldo F. Oliveira | // | + | | 02:30 PM | Dr. Geraldo F. Oliveira | // |
| 03:00 PM | N/A | **Coffee Break** | | | | 03:00 PM | N/A | **Coffee Break** | | | ||
| 03:15 PM | Taewoon Kang | SparsePIM: An Efficient HBM-Based PIM Architecture for Sparse Matrix-Vector Multiplications | | | | 03:15 PM | Taewoon Kang | SparsePIM: An Efficient HBM-Based PIM Architecture for Sparse Matrix-Vector Multiplications | | | ||
Line 63: | Line 63: | ||
==== Invited Speakers ==== | ==== Invited Speakers ==== | ||
- | === Ismail E. Yüksek | + | === Ismail E. Yüksel |
**Talk Title:** Functionally-Complete Boolean Logic in Real DRAM Chip {{ :: | **Talk Title:** Functionally-Complete Boolean Logic in Real DRAM Chip {{ :: | ||
Line 69: | Line 69: | ||
- | **Bio:** [[https:// | + | **Bio:** [[https:// |
- | === Theocharis Diamantidis | + | === Konstantina Koliogeorgi |
- | **Talk Title: | + | **Talk Title:** PIM Architectures |
- | **Talk Abstract: | + | **Talk Abstract: |
+ | **Bio:** [[https:// | ||
- | **Bio:** Theocharis Diamantidis is a current student at the National Technical University of Athens (NTUA) in the Department of Electrical and Computer Engineering where he now completes his Master thesis. He aspires to pursue a doctorate at the MicroLab laboratory at NTUA. His academic interests lie in circuit design and simulation, with a focus on Analog | + | === Prof. Elaheh Sadredini (University of California, Riverside) === |
+ | **Talk Title:** Keep it Close, Keep it Secure! Towards Efficient, Secure, and Programmable | ||
- | === Krystian Chmielewski (Huawei Warsaw Research Center) === | ||
- | **Talk Title:** Pitfalls of UPMEM Kernel Development | ||
- | **Talk Abstract: | + | **Talk Abstract:** Processing-in-memory |
+ | |||
+ | **Bio:** [[https:// | ||
+ | |||
- | **Bio:** Krystian Chmielewski is a software engineer with 8 years of experience in emerging computing architectures and low-level performance optimizations. Since 2023, he has been working at Huawei Warsaw Research Center in Poland focusing on the enablement of novel Processing-In-Memory architectures and optimizing the JVM's just-in-time compiler. Prior to this, Krystian spent 6 years at Intel, where he specialized in compute runtimes and worked on features such as Mutable Command Lists. | + | === Taewoon Kang (Korea University) === |
+ | **Talk Title:** SparsePIM: An Efficient HBM-Based PIM Architecture for Sparse Matrix-Vector Multiplications {{ :: | ||
- | === Yintao He (UCAS) === | + | **Talk |
- | **Talk | + | In order to address these challenges, we propose SparsePIM, a novel PIM architecture designed to accelerate SpMV computations efficiently. SparsePIM introduces a DRAM row-aligned format (DRAF) to optimize memory access patterns. SparsePIM exploits K-means-based column group partitioning to achieve a balanced load distribution across memory banks. Furthermore, |
- | **Talk Abstract:** Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens. Several prior works focus on improving the performance of LLM decoding using parallelism techniques, such as batching and speculative decoding. State-of-the-art LLM decoding has both compute-bound and memory-bound kernels. Some prior works statically identify and map these different kernels to a heterogeneous architecture consisting of both processing-in-memory (PIM) units and computation-centric accelerators. We observe that characteristics of LLM decoding kernels (e.g., whether or not a kernel is memory-bound) can change dynamically due to parameter changes to meet user and/or system demands, making (1) static kernel mapping to PIM units and computation-centric accelerators suboptimal, and (2) one-size-fits-all approach of designing PIM units inefficient due to a large degree of heterogeneity even in memory-bound kernels. | ||
- | In this paper, we aim to accelerate LLM decoding while considering the dynamically changing characteristics of the kernels involved. We propose PAPI (PArallel Decoding with PIM), a PIM-enabled heterogeneous architecture that exploits dynamic scheduling of compute-bound or memory-bound kernels to suitable hardware units. PAPI has two key mechanisms: (1) online kernel characterization to dynamically schedule kernels to the most suitable hardware units at runtime and (2) a PIM-enabled heterogeneous computing system that harmoniously orchestrates both computation-centric processing units and hybrid PIM units with different computing capabilities. Our experimental results on three broadly-used LLMs show that PAPI achieves 1.8× and 11.1× speedups over a state-of-the-art heterogeneous LLM accelerator and a state-of-the-art PIM-only LLM accelerator, | ||
+ | **Bio:** [[https:// | ||
- | **Bio:** Yintao He received the BE degree in electronic science and technology from Nankai University, Tianjin, China, in 2019. She is currently working toward the PhD degree with the University of Chinese Academy of Sciences, Beijing, China. Her research interests include processing in-memory and energy-efficient accelerators. | ||
- | |||
- | |||
- | === Yufeng Gu (University of Michigan) === | ||
- | **Talk Title:** PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference {{ :: | ||
- | |||
- | **Talk Abstract:** Large Language Model (LLM) inference uses an autoregressive manner to generate one token at a time, which exhibits notably lower operational intensity compared to earlier Machine Learning (ML) models such as encoder-only transformers and Convolutional Neural Networks. At the same time, LLMs possess large parameter sizes and use key-value caches to store context information. Modern LLMs support context windows with up to 1 million tokens to generate versatile text, audio, and video content. A large key-value cache unique to each prompt requires a large memory capacity, limiting the inference batch size. Both low operational intensity and limited batch size necessitate a high memory bandwidth. However, contemporary hardware systems for ML model deployment, such as GPUs and TPUs, are primarily optimized for compute throughput. This mismatch challenges the efficient deployment of advanced LLMs and makes users to pay for expensive compute resources that are poorly utilized for the memory-bound LLM inference tasks. | ||
- | |||
- | **Bio:** [[https:// | ||
- | |||
- | |||
- | === Dr. Christina Giannoula (University of Toronto) === | ||
- | **Talk Title:** PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures {{ :: | ||
- | |||
- | **Talk Abstract:** Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple processors near or inside to memory arrays. In this work, we introduce PyGim, an efficient ML library that accelerates GNNs on real PIM systems. We propose intelligent parallelization techniques for memory-intensive kernels of GNNs tailored for real PIM systems, and develop handy Python API for them. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric computing systems, respectively. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource utilization than CPU and GPU systems. Our work provides useful recommendations for software, system and hardware designers. | ||
- | |||
- | |||
- | **Bio:** [[https:// | ||
- | Engineering, | ||
start.1749338960.txt.gz · Last modified: 2025/06/07 23:29 by geraldod