1. 25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications
- Author
-
Jae-Hoon Lee, Soo-Young Kim, O Seongil, Kyomin Sohn, Myeong Jun Song, Yu-Hwan Ro, Sukhan Lee, Hyoung-Min Kim, Wang David T, Jongyoon Choi, Je Min Ryu, Eun-Bong Kim, SooYoung Kim, Nam Sung Kim, Jae-Youn Youn, Daeho Kim, Sang-Hyuk Kwon, Jin Kim, Jin Guk Kim, Jong-Pil Son, Bengseng Phuah, Hyun-Sung Shin, Hae-Suk Lee, Shin-haeng Kang, Young-Cheon Kwon, Seung-Woo Seo, Young-min Cho, Hak-soo Yu, Joon-Ho Song, and Ahn Choi
- Subjects
010302 applied physics ,business.industry ,Computer science ,Process (computing) ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Recurrent neural network ,Memory bank ,Memory management ,Parallel processing (DSP implementation) ,Embedded system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Bandwidth (computing) ,business ,Conventional memory ,Dram - Abstract
In recent years, artificial intelligence (AI) technology has proliferated rapidly and widely into application areas such as speech recognition, health care, and autonomous driving. To increase the capabilities of AI more powerful systems are needed to process a larger amount of data. This requirement has made domain-specific accelerators, such as GPUs and TPUs, popular; as they can provide orders of magnitude higher performance than state-of-the-art CPUs. However, these accelerators can only operate at their peak performance when they get the necessary data from memory as quickly as it is processed: requiring off-chip memory with a high bandwidth and a large capacity [1]. HBM has thus far met the bandwidth and capacity requirement [2] –[6], but recent AI technologies such as recurrent neural networks require an even higher bandwidth than HBM [7]–[8]. While a further increase in off-chip bandwidth can be accomplished by various techniques, it is often limited by power constraints at the chip or system level [9]. Hence, it is essential to decrease demand for off-chip bandwidth with unconventional architectures: such as processing-in-memory. In this paper, we present function-In-memory DRAM (FIMDRAM) that integrates a 16-wide single-instruction multiple-data engine within the memory banks and that exploits bank-level parallelism to provide $4 \times $ higher processing bandwidth than an off-chip memory solution. Second, we show techniques that do not require any modification to conventional memory controllers and their command protocols, which make FIMDRAM more practical for quick industry adoption. Finally, we conclude this paper with circuit- and system-level evaluations of our fabricated FIMDRAM.
- Published
- 2021
- Full Text
- View/download PDF