Start Over

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

Authors :: Ab Al-Hadi Ab Rahman
Sayed Omid Ayat
Mohamed Khalil-Hani
Source :: Volume: 26, Issue: 2 919-935, Turkish Journal of Electrical Engineering and Computer Science
Publication Year :: 2018
Publisher :: The Scientific and Technological Research Council of Turkey (TUBITAK-ULAKBIM) - DIGITAL COMMONS JOURNALS, 2018.
Abstract: In recent years, the convolutional neural network (CNN) has found wide acceptance in solving practical computer vision and image recognition problems. Also recently, due to its flexibility, faster development time, and energy efficiency, the field-programmable gate array (FPGA) has become an attractive solution to exploit the inherent parallelism in the feedforward process of the CNN. However, to meet the demands for high accuracy of today's practical recognition applications that typically have massive datasets, the sizes of CNNs have to be larger and deeper. Enlargement of the CNN aggravates the problem of off-chip memory bottleneck in the FPGA platform since there is not enough space to save large datasets on-chip. In this work, we propose a memory system architecture that best matches the off-chip memory traffic with the optimum throughput of the computation engine, while it operates at the maximum allowable frequency. With the help of an extended version of the Roofline model proposed in this work, we can estimate memory bandwidth utilization of the system at different operating frequencies since the proposed model considers operating frequency in addition to bandwidth utilization and throughput. In order to find the optimal solution that has the best energy efficiency, we make a trade-off between energy efficiency and computational throughput. This solution saves 18% of energy utilization with the trade-off having less than 2% reduction in throughput performance. We also propose to use a race-to-halt strategy to further improve the energy efficiency of the designed CNN accelerator. Experimental results show that our CNN accelerator can achieve a peak performance of 52.11 GFLOPS and energy efficiency of 10.02 GFLOPS/W on a ZYNQ ZC706 FPGA board running at 250 MHz, which outperforms most previous approaches.

Subjects :: General Computer Science
Computer science
Feed forward
Memory bandwidth
02 engineering and technology
Bottleneck
Convolutional neural network,field-programmable gate array,energy efficiency,Roofline model,race-to-halt strategy
Computer engineering
Gate array
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Electrical and Electronic Engineering
Field-programmable gate array
Throughput (business)
Energy (signal processing)
Efficient energy use

Details

ISSN :: 13000632 and 13036203
Volume :: 26
Database :: OpenAIRE
Journal :: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
Accession number :: edsair.doi.dedup.....052633132868528aa72c3a2d48ae1bfe

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources