跳至主導覽 跳至搜尋 跳過主要內容

CIM for Transformer Models: Enhancing Large Language Model Inference Efficiency

  • Meng Syuan Li
  • , Jung Fang Ke
  • , En Ming Huang
  • , Zhi Wei Liu
  • , Yu Guang Chen
  • , Chun Yi Lee

研究成果: 書貢獻/報告類型會議論文篇章同行評審

摘要

In the field of large language model (LLM) inference, the high computational demand and extensive memory requirements for weights and key-value (KV) cache storage present significant challenges. This issue becomes especially problematic when relying exclusively on GPUs, as they often lack the capacity to accommodate the entire KV cache, particularly in larger LLMs. In the absence of direct communications like NVlink among multiple GPUs, LLMs typically require offloading the KV cache to the CPU for storage and computation, followed by transferring the multi-head attention results back to the GPU for subsequent transformer computations. Given that attention score computation is computationally demanding on the CPU and requires substantial data movement between KV caches and memory, the direct computation of attention scores and even the feedforward layers on Compute-in-Memory (CIM) systems emerges as a viable alternative. This paper is at the forefront of integrating CIM technology in LLM inference, and proposes an innovative architecture that leverages this emerging technology to enhance inference efficiency. Specifically, we present a tailored CIM-based dataflow and hierarchy design for optimize the computation of attention scores and feed-forward layers using CIMs. The results show improvements in performance, with 0.026 × inference latency and 1.199 × 10-3 × energy as compared to a CPU-based implementation.

原文???core.languages.en_GB???
主出版物標題IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2025 - Conference Proceedings
發行者IEEE Computer Society
ISBN(電子)9798331534776
DOIs
出版狀態已出版 - 2025
事件28th IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2025 - Kalamata, Greece
持續時間: 6 7月 20259 7月 2025

出版系列

名字Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
ISSN(列印)2159-3469
ISSN(電子)2159-3477

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???28th IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2025
國家/地區Greece
城市Kalamata
期間6/07/259/07/25

指紋

深入研究「CIM for Transformer Models: Enhancing Large Language Model Inference Efficiency」主題。共同形成了獨特的指紋。

引用此