Back to Search
Start Over
OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks
- Source :
- IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 28:35-47
- Publication Year :
- 2020
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2020.
-
Abstract
- Field-programmable gate array (FPGA) provides rich parallel computing resources with high energy efficiency, making it ideal for deep convolutional neural network (CNN) acceleration. In recent years, automatic compilers have been developed to generate network-specific FPGA accelerators. However, with more cascading deep CNN algorithms adapted by various complicated tasks, reconfiguration of FPGA devices during runtime becomes unavoidable when network-specific accelerators are employed. Such reconfiguration can be difficult for edge devices. Moreover, network-specific accelerator means regeneration of RTL code and physical implementation whenever the network is updated. This is not easy for CNN end users. In this article, we propose a domain-specific FPGA overlay processor, named OPU to accelerate CNN networks. It offers software-like programmability for CNN end users, as CNN algorithms are automatically compiled into executable codes, which are loaded and executed by OPU without reconfiguration of FPGA for switch or update of CNN networks. Our OPU instructions have complicated functions with variable runtimes but a uniform length. The granularity of instruction is optimized to provide good performance and sufficient flexibility, while reducing complexity to develop microarchitecture and compiler. Experiments show that OPU can achieve an average of 91% runtime multiplication and accumulation unit (MAC) efficiency (RME) among nine different networks. Moreover, for VGG and YOLO networks, OPU outperforms automatically compiled network-specific accelerators in the literature. In addition, OPU shows $5.35\times $ better power efficiency compared with Titan Xp. For a real-time cascaded CNN networks scenario, OPU is $2.9\times $ faster compared with edge computing GPU Jetson Tx2, which has a similar amount of computing resources.
- Subjects :
- Edge device
Computer science
Control reconfiguration
02 engineering and technology
Parallel computing
computer.file_format
computer.software_genre
Convolutional neural network
020202 computer hardware & architecture
Microarchitecture
Hardware and Architecture
Gate array
0202 electrical engineering, electronic engineering, information engineering
Compiler
Executable
Electrical and Electronic Engineering
Field-programmable gate array
computer
Software
Edge computing
Subjects
Details
- ISSN :
- 15579999 and 10638210
- Volume :
- 28
- Database :
- OpenAIRE
- Journal :
- IEEE Transactions on Very Large Scale Integration (VLSI) Systems
- Accession number :
- edsair.doi...........fcf42f904197ad2ed13fce41fdb01ed1
- Full Text :
- https://doi.org/10.1109/tvlsi.2019.2939726