Start Over

Phoenix: A Low-Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks

Authors :: Wu, Chen
Wang, Mingyu
Li, Xiayu
Lu, Jicheng
Wang, Kun
He, Lei
Publication Year :: 2020
Publisher :: arXiv, 2020.
Abstract: Convolutional neural networks (CNNs) achieve state-of-the-art performance at the cost of becoming deeper and larger. Although quantization (both fixed-point and floating-point) has proven effective for reducing storage and memory access, two challenges -- 1) accuracy loss caused by quantization without calibration, fine-tuning or re-training for deep CNNs and 2) hardware inefficiency caused by floating-point quantization -- prevent processors from completely leveraging the benefits. In this paper, we propose a low-precision floating-point quantization oriented processor, named Phoenix, to address the above challenges. We primarily have three key observations: 1) 8-bit floating-point quantization incurs less error than 8-bit fixed-point quantization; 2) without using any calibration, fine-tuning or re-training techniques, normalization before quantization further reduces accuracy degradation; 3) 8-bit floating-point multiplier achieves higher hardware efficiency than 8-bit fixed-point multiplier if the full-precision product is applied. Based on these key observations, we propose a normalization-oriented 8-bit floating-point quantization method to reduce storage and memory access with negligible accuracy loss (within 0.5%/0.3% for top-1/top-5 accuracy, respectively). We further design a hardware processor to address the hardware inefficiency caused by floating-point multiplier. Compared with a state-of-the-art accelerator, Phoenix is 3.32x and 7.45x better in performance with the same core area for AlexNet and VGG16, respectively.<br />Comment: 14 pages, 18 figures, submitted to TVLSI

Subjects :: Signal Processing (eess.SP)
Image and Video Processing (eess.IV)
FOS: Electrical engineering, electronic engineering, information engineering
Electrical Engineering and Systems Science - Signal Processing
Electrical Engineering and Systems Science - Image and Video Processing
Hardware_ARITHMETICANDLOGICSTRUCTURES

Details

Database :: OpenAIRE
Accession number :: edsair.doi.dedup.....775b06c921ac33574a0a6a27a1ba151e
Full Text :: https://doi.org/10.48550/arxiv.2003.02628

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Phoenix: A Low-Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Phoenix: A Low-Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources