Back to Search Start Over

Simba

Authors :
Stephen W. Keckler
Priyanka Raina
Joel Emer
Nan Jiang
Nathaniel Pinckney
Stephen G. Tell
Yakun Sophia Shao
Brian Zimmer
Brucek Khailany
Alicia Klinefelter
William J. Dally
Yanqing Zhang
Matthew Fojtik
Jason Clemons
C. Thomas Gray
Ben Keller
Rangharajan Venkatesan
Source :
MICRO
Publication Year :
2019
Publisher :
ACM, 2019.

Abstract

Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. Compared to a large monolithic die, an MCM combines many smaller chiplets into a larger system, substantially reducing fabrication and design costs. Current MCMs typically only contain a handful of coarse-grained large chiplets due to the high area, performance, and energy overheads associated with inter-chiplet communication. This work investigates and quantifies the costs and benefits of using MCMs with fine-grained chiplets for deep learning inference, an application area with large compute and on-chip storage requirements. To evaluate the approach, we architected, implemented, fabricated, and tested Simba, a 36-chiplet prototype MCM system for deep-learning inference. Each chiplet achieves 4 TOPS peak performance, and the 36-chiplet MCM package achieves up to 128 TOPS and up to 6.1 TOPS/W. The MCM is configurable to support a flexible mapping of DNN layers to the distributed compute and storage units. To mitigate inter-chiplet communication overheads, we introduce three tiling optimizations that improve data locality. These optimizations achieve up to 16% speedup compared to the baseline layer mapping. Our evaluation shows that Simba can process 1988 images/s running ResNet-50 with batch size of one, delivering inference latency of 0.50 ms.

Details

Database :
OpenAIRE
Journal :
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
Accession number :
edsair.doi...........cdc1e8023a11fe57945a346614568f80
Full Text :
https://doi.org/10.1145/3352460.3358302