1. Energy-efficient object detection: impact of weight clustering for different arithmetic representations
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Caro Roca, Martí, Abella Ferrer, Jaume, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Caro Roca, Martí, and Abella Ferrer, Jaume
- Abstract
Object detection in video streams is often realized with Deep Neural Networks (DNNs), which require fetching, to the computing unit where they are executed, large volumes of weights to process each image of the video. In general, those weights largely exceed the capacity of on-chip cache memories in many applications (e.g., object detection in autonomous cars), and hence, pose high memory bandwidth requirements and induce a high energy consumption to fetch those many weights for each image. Solutions such as weight clustering to reduce the bits required to encode each weight, along with lower precision arithmetic to reduce the energy consumption of the accelerator running DNNs, are well-known solutions. However, they are often evaluated in small-scale setups that are not fully representative of the target applications, hence, limiting the value of the conclusions obtained. This paper covers this gap by evaluating the combined use of weight clustering and reduced precision arithmetic, for both floating point (FP) and integer arithmetic, at-scale with the You Only Look Once v3 (YOLOv3) camera-based object detector, used in a wide variety of industrial applications, including autonomous driving cars and trains. Our assessment shows that accuracy results degrade slowly as fewer bits are used for the (clustered) weights, but instead, bandwidth requirements and its associated power consumption decrease rapidly. When combining weight clustering with reduced precision arithmetic in the accelerator performing inference, both bandwidth requirements and energy consumption are reduced to 20-30% of those of the original YOLOv3 FP 32-bit implementation. Overall, our analysis shows that techniques such as weight clustering and reduced precision arithmetic provide large energy and bandwidth gains for real-life AI-based applications used for camera-based object detection, and provides guidance on the effectiveness of each technique., This work is part of the project REBECCA, funded by MICIU/AEI/10.13039/501100011033 and the European Union NextGenerationEU/PRTR under grant PCI2022-134984-2, and the European Union’s Horizon Europe Programme Chips Joint Undertaking (JU) under grant agreement No 101097224. This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant PID2019-107255GBC21/AEI/10.13039/501100011033., Peer Reviewed, Postprint (author's final draft)
- Published
- 2024