1. Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode
- Author
-
Angelo Garofalo, Francesco Conti, DAVIDE ROSSI, Giuseppe Tagliavini, GIANMARCO OTTAVI, LUCA BENINI, and Alfio Di Mauro
- Subjects
FOS: Computer and information sciences ,Hardware and Architecture ,Hardware Architecture (cs.AR) ,Electrical and Electronic Engineering ,Computer Science - Hardware Architecture ,QNN inference ,mixed-precision ,SIMD ,MIMD ,RISC-V - Abstract
Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices poses several challenges related to memory footprint, computational throughput, and energy efficiency. Low-bitwidth and mixed-precision arithmetic have been proven to be valid strategies for tackling these problems. We present Dustin, a fully programmable compute cluster integrating 16 RISC-V cores capable of 2- to 32-bit arithmetic and all possible mixed-precision permutations. In addition to a conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm, Dustin introduces a Vector Lockstep Execution Mode (VLEM) to minimize power consumption in highly data-parallel kernels. In VLEM, a single leader core fetches instructions and broadcasts them to the 15 follower cores. Clock gating Instruction Fetch (IF) stages and private caches of the follower cores leads to 38\% power reduction with minimal performance overhead (, 13 pages, 17 figures, 2 tables, Journal
- Published
- 2023