Back to Search Start Over

An energy-efficient GeMM-based convolution accelerator with on-the-fly im2col

Authors :
Universitat Politècnica de Catalunya. Doctorat en Enginyeria Electrònica
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica
Universitat Politècnica de Catalunya. EFRICS - Efficient and Robust Integrated Circuits and Systems
Fornt Mas, Jordi
Fontova Muste, Pau
Caro Roca, Martí
Abella Ferrer, Jaume
Moll Echeto, Francisco de Borja
Altet Sanahujes, Josep
Studer, Christoph
Universitat Politècnica de Catalunya. Doctorat en Enginyeria Electrònica
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica
Universitat Politècnica de Catalunya. EFRICS - Efficient and Robust Integrated Circuits and Systems
Fornt Mas, Jordi
Fontova Muste, Pau
Caro Roca, Martí
Abella Ferrer, Jaume
Moll Echeto, Francisco de Borja
Altet Sanahujes, Josep
Studer, Christoph
Publication Year :
2023

Abstract

Systolic array architectures have recently emerged as successful accelerators for deep convolutional neural network (CNN) inference. Such architectures can be used to efficiently execute general matrix–matrix multiplications (GeMMs), but computing convolutions with this primitive involves transforming the 3-D input tensor into an equivalent matrix, which can lead to an inflation of the input data, increasing the off-chip memory traffic which is critical for energy efficiency. In this work, we propose a GeMM-based systolic array accelerator that uses a novel data feeder architecture to perform on-chip, on-the-fly convolution lowering (also known as im2col), supporting arbitrary tensor and kernel sizes as well as strided and dilated (or atrous) convolutions. By using our data feeder, we reduce memory transactions and required bandwidth on state-of-the-art CNNs by a factor of two, while only adding an area and power overhead of 4% and 7%, respectively. Application specific integrated circuit (ASIC) implementation of our accelerator in 22-nm technology fits in less than 1.1 mm 2 and reaches an energy efficiency of 1.10 TFLOP/sW with 16-bit floating-point arithmetic.<br />This work was supported in part by the MCIN/AEI/10.13039/501100011033 under Project PCI2020-134984-2, in part by the European Union NextGenerationEU/PRTR, in part by the European Union’s Horizon Europe Program under Project Key Digital Technologies (KDT) Joint Undertaking (JU) under Grant 101097224, and in part by the Spanish Ministry of Science and Innovation through MCIN/AEI/10.13039/501100011033 under Grant PID2019-107255GB-C21.<br />Peer Reviewed<br />Postprint (author's final draft)

Details

Database :
OAIster
Notes :
5 p., application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1397548196
Document Type :
Electronic Resource