Back to Search Start Over

25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications

Authors :
Jae-Hoon Lee
Soo-Young Kim
O Seongil
Kyomin Sohn
Myeong Jun Song
Yu-Hwan Ro
Sukhan Lee
Hyoung-Min Kim
Wang David T
Jongyoon Choi
Je Min Ryu
Eun-Bong Kim
SooYoung Kim
Nam Sung Kim
Jae-Youn Youn
Daeho Kim
Sang-Hyuk Kwon
Jin Kim
Jin Guk Kim
Jong-Pil Son
Bengseng Phuah
Hyun-Sung Shin
Hae-Suk Lee
Shin-haeng Kang
Young-Cheon Kwon
Seung-Woo Seo
Young-min Cho
Hak-soo Yu
Joon-Ho Song
Ahn Choi
Source :
ISSCC
Publication Year :
2021
Publisher :
IEEE, 2021.

Abstract

In recent years, artificial intelligence (AI) technology has proliferated rapidly and widely into application areas such as speech recognition, health care, and autonomous driving. To increase the capabilities of AI more powerful systems are needed to process a larger amount of data. This requirement has made domain-specific accelerators, such as GPUs and TPUs, popular; as they can provide orders of magnitude higher performance than state-of-the-art CPUs. However, these accelerators can only operate at their peak performance when they get the necessary data from memory as quickly as it is processed: requiring off-chip memory with a high bandwidth and a large capacity [1]. HBM has thus far met the bandwidth and capacity requirement [2] –[6], but recent AI technologies such as recurrent neural networks require an even higher bandwidth than HBM [7]–[8]. While a further increase in off-chip bandwidth can be accomplished by various techniques, it is often limited by power constraints at the chip or system level [9]. Hence, it is essential to decrease demand for off-chip bandwidth with unconventional architectures: such as processing-in-memory. In this paper, we present function-In-memory DRAM (FIMDRAM) that integrates a 16-wide single-instruction multiple-data engine within the memory banks and that exploits bank-level parallelism to provide $4 \times $ higher processing bandwidth than an off-chip memory solution. Second, we show techniques that do not require any modification to conventional memory controllers and their command protocols, which make FIMDRAM more practical for quick industry adoption. Finally, we conclude this paper with circuit- and system-level evaluations of our fabricated FIMDRAM.

Details

Database :
OpenAIRE
Journal :
2021 IEEE International Solid- State Circuits Conference (ISSCC)
Accession number :
edsair.doi...........807b77e7255e186b2571cf42a2acc79f
Full Text :
https://doi.org/10.1109/isscc42613.2021.9365862