Back to Search Start Over

Improving Inference Latency and Energy of DNNs through Wireless Enabled Multi-Chip-Module-based Architectures and Model Parameters Compression

Authors :
Vincenzo Catania
Andrea Mineo
Maurizio Palesi
Giuseppe Ascia
Salvatore Monteleone
Davide Patti
Source :
NOCS
Publication Year :
2020
Publisher :
IEEE, 2020.

Abstract

Performance and energy figures of Deep Neural Network (DNN) accelerators are profoundly affected by the communication and memory sub-system. In this paper, we make the case of a state-of-the-art multi-chip-module-based architecture for DNN inference acceleration. We propose a hybrid wired/wireless network-in-package interconnection fabric and a compression technique for drastically improving the communication efficiency and reducing the memory and communication traffic with a consequent improvement of performance and energy metrics. We assess the inference performance and energy improvement vs. accuracy degradation for different CNNs showing that up to 77% and 68% of inference latency reduction and inference energy reduction, respectively, can be obtained while keeping the accuracy degradation below 5% as respect to the original uncompressed CNN.

Details

Database :
OpenAIRE
Journal :
2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS)
Accession number :
edsair.doi.dedup.....8130eeecc56bdf7a2590c0f786eb9176
Full Text :
https://doi.org/10.1109/nocs50636.2020.9241714