Back to Search
Start Over
Improving Inference Latency and Energy of DNNs through Wireless Enabled Multi-Chip-Module-based Architectures and Model Parameters Compression
- Source :
- NOCS
- Publication Year :
- 2020
- Publisher :
- IEEE, 2020.
-
Abstract
- Performance and energy figures of Deep Neural Network (DNN) accelerators are profoundly affected by the communication and memory sub-system. In this paper, we make the case of a state-of-the-art multi-chip-module-based architecture for DNN inference acceleration. We propose a hybrid wired/wireless network-in-package interconnection fabric and a compression technique for drastically improving the communication efficiency and reducing the memory and communication traffic with a consequent improvement of performance and energy metrics. We assess the inference performance and energy improvement vs. accuracy degradation for different CNNs showing that up to 77% and 68% of inference latency reduction and inference energy reduction, respectively, can be obtained while keeping the accuracy degradation below 5% as respect to the original uncompressed CNN.
- Subjects :
- Wireless NoC
Interconnection
Artificial neural network
business.industry
Computer science
020208 electrical & electronic engineering
Multi-chip module
Inference
02 engineering and technology
DNN Accelerators
020202 computer hardware & architecture
Uncompressed video
DNN Compression
Network on a chip
Computer engineering
0202 electrical engineering, electronic engineering, information engineering
Multi-Chip-Module
Wireless
Latency (engineering)
Network-on-Chip
business
Network-in-Package
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS)
- Accession number :
- edsair.doi.dedup.....8130eeecc56bdf7a2590c0f786eb9176
- Full Text :
- https://doi.org/10.1109/nocs50636.2020.9241714