1. DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training
- Author
-
Angelo Garofalo, Yvan Tortorella, Matteo Perotti, Luca Valente, Alessandro Nadalini, Luca Benini, Davide Rossi, and Francesco Conti
- Subjects
Heterogeneous cluster ,tensor product engine (TPE) ,ultralow-power AI ,Electric apparatus and materials. Electric circuits. Electric networks ,TK452-454.4 - Abstract
On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a System-on-Chip with a heterogeneous cluster of eight RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost the performance and efficiency on key compute-intensive DNN kernels, the cluster is enriched with three digital accelerators: 1) a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); 2) a minimal overhead datamover to marshal 1–32-b data on-the-fly; and 3) a 16-b floating-point tensor product engine (TPE) for tiled matrix-multiplication acceleration. DARKSIDE is implemented in 65-nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency—enough to enable on-chip floating-point training at competitive speed coupled with ultralow power quantized inference.
- Published
- 2022
- Full Text
- View/download PDF