TT-CIM: Tensor Train Decomposition for Neural Network in RRAM-Based Compute-in-Memory Systems
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS(2024)
摘要
Compute-in-Memory (CIM) implemented with Resistive-Random-Access-Memory (RRAM) crossbars is a promising approach for accelerating Convolutional Neural Network (CNN) computations. The growing size in the number of parameters in state-of-the-art CNN models, however, creates challenge for on-chip weight storage for CIM implementations, and CNN compression becomes a crucial topic of exploration. Tensor Train (TT) decomposition can be used to decompose a tensor into smaller ones with fewer parameters, at the cost of increased number of computations. In this work we propose a technique to minimize intermediate operations across the full convolution operation and improve hardware utilization to implement TT-CNNs in CIM systems. We first use an iterative decompose-and-fine-tune method to prepare TT-CNNs. We then propose an inter-convolutional-step reuse scheme to reduce the required operation count and post-mapping RRAM count for TT-CNN implementation in tiled-CIM architecture. We demonstrate that through proper mapping, pipelining, and reuse, effective compression ratio of 12 and 20 with 0.8% and 1.4% accuracy drop, respectively for WRN; and effective compression ratio of 6 and 11 with 0.9% and 1.2% accuracy drop for VGG8. We also show that around 30% higher hardware utilization than the original CNN format can be achieved using the proposed TT-CIM approaches.
更多查看译文
关键词
Compute-in-memory,deep neural network,convolutional neural network,neural network compression,tensor train decomposition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn