Design and Analysis of L3 Cache Embedded-GPU-High Bandwidth Memory Architecture with Reduced Energy and Latency for AI Computing
2024 IEEE 33rd Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)(2024)
摘要
For the first time, this paper proposes a L3 cache embedded-GPU-High bandwidth memory (L3E-GPU-HBM) for reduced latency and enhanced energy efficiency of large scale memory intensive AI computing. Accessing HBM in conventional GPU-HBM architecture involves significant latency and requires high data movement energy. To address the challenge, we propose L3E-GPU-HBM in which L3 cache is embedded in interposer between GPU and HBM. To implement the proposed architecture, embedded SRAM interconnect (ESI) chip is employed, which consists of local silicon interconnect (LSI) die and L3 cache die, merged by hybrid bonding. Then, using Chip-on-Wafer-on-Substrate with Local interconnect (CoWoS-L) method, ESI chip is placed inside the reconstituted interposer (RI). The ESI chip functions as both interconnect and L3 cache between L2 cache of GPU and HBM. For verification of the proposed architecture, the circuit model of driver and channel is utilized to obtain the wire latency and energy. The result showed that the proposed L3E-GPU-HBM architecture reduced the wire latency and energy compared to conventional GPU-HBM architecture by 17% and 33 %, respectively.
更多查看译文
关键词
Embedded SRAM Interconnect,GPU,High Bandwidth Memory,L3 Cache,Energy,Wire Latency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn