1. A high-speed reusable quantized hardware accelerator design for CNN on constrained edge device
- Author
-
Yanamala, Rama Muni Reddy and Pullakandam, Muralidhar
- Abstract
The most recent deep learning technique used in many applications is the convolutional neural network (CNN). Recent years have seen a rise in demand for real-time CNN implementations on various embedded devices with restricted resources. The CNN models should be implemented using field-programmable gate arrays to ensure flexible programmability and speed up the development process. However, the CNN acceleration is hampered by complex computations, limited bandwidth, and on-chip memory storage. In this paper, a reusable quantized hardware architecture was proposed to accelerate deep CNN models by solving the above issues. Twenty-five processing elements are employed for the computation of convolutions in the CNN model. Pipelining, loop unrolling, and array partitioning are the techniques for increasing the speed of computations in both the convolution layers and fully connected layers. This design is tested with MNIST handwritten digit image classification on a low-cost, low-memory Xilinx PYNQ-Z2 system on chip edge device. The inference speed of the proposed hardware design achieved 92.7% higher than INTEL core3 CPU, 90.7% more than Haswell core2 CPU, 87.7% more than NVIDIA Tesla K80 GPU, and 84.9% better when compared to the conventional hardware accelerator with one processing element. The proposed quantized architecture design has achieved the performance of 4.4 GOP/s without compromising the accuracy and it was 2 times more than the conventional architecture.
- Published
- 2023
- Full Text
- View/download PDF