1. A Mixed-Pruning Based Framework for Embedded Convolutional Neural Network Acceleration.
- Author
-
Chang, Xuepeng, Pan, Huihui, Lin, Weiyang, and Gao, Huijun
- Subjects
- *
CONVOLUTIONAL neural networks , *FIELD programmable gate arrays , *PHYSIOLOGICAL effects of acceleration , *ARTIFICIAL intelligence , *PROBLEM solving , *SPACE-time codes , *DATA warehousing - Abstract
Convolutional neural networks (CNN) have been proved to be an effective method in the field of artificial intelligence (AI), and large-scale deploying CNN to embedded devices, no doubt, will greatly promote the development and application of AI into the practical industry. However, mainly due to the space-time complexity of CNN, computing power, memory bandwidth and flexibility are performance bottlenecks. In this paper, a framework containing model compression and hardware acceleration is proposed to solve the above problems. This framework consists of a mixed pruning method, data storage optimization for efficient memory utilization and an accelerator for mapping CNN on field programmable gate array (FPGA). The mixed pruning method is used to compress the model, and data bit-width is reduced to 8-bit by data quantization. Accelerator based on FPGA makes it flexible, configurable and efficient for CNN implementation. The model compression is evaluated on NVIDIA RTX2080Ti, and the results illustrate that the VGG16 is compressed by $30\times $ and the fully convolutional network (FCN) is compressed by $11\times $ within 1% accuracy loss. The compressed model is deployed and accelerated on ZCU102, which is up to $1.7\times $ and $24.5\times $ better in energy efficiency compared with RTX2080Ti and Intel i7 7700. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF