151. A 7.3 M Output Non-Zeros/J, 11.7 M Output Non-Zeros/GB Reconfigurable Sparse Matrix–Matrix Multiplication Accelerator
- Author
-
Chaitali Chakrabarti, Trevor Mudge, Dong-Hyeon Park, Subhankar Pal, Ronald G. Dreslinski, David Blaauw, Aporva Amarnath, Jonathan Beaumont, Chun Zhao, Austin Rovinski, Jielun Tan, Siying Feng, Hun-Seok Kim, Kuan-Yu Chen, Timothy Wesley, Michael Taylor, Paul Gao, and Shaolin Xie
- Subjects
Discrete mathematics ,Physics ,Hardware_MEMORYSTRUCTURES ,Memory hierarchy ,020208 electrical & electronic engineering ,02 engineering and technology ,Spectral efficiency ,Chip ,Matrix multiplication ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,Multiplication ,Cache ,Electrical and Electronic Engineering ,Sparse matrix - Abstract
A sparse matrix–matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40-nm CMOS. The compute fabric consists of dedicated floating-point multiplication units, and general-purpose Arm Cortex-M0 and Cortex-M4 cores. The on-chip memory reconfigures scratchpad or cache, depending on the phase of the algorithm. The memory and compute units are interconnected with synthesizable coalescing crossbars for efficient memory access. The 2.0-mm $\times $ 2.6-mm chip exhibits 12.6 $\times $ (8.4 $\times $ ) energy efficiency gain, 11.7 $\times $ (77.6 $\times $ ) off-chip bandwidth efficiency gain, and 17.1 $\times $ (36.9 $\times $ ) compute density gain s against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph-based sparse matrices.
- Published
- 2020
- Full Text
- View/download PDF