Back to Search Start Over

Attention Round for post-training quantization.

Authors :
Diao, Huabin
Li, Gongyan
Xu, Shaoyun
Kong, Chao
Wang, Wei
Source :
Neurocomputing. Jan2024, Vol. 565, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Quantization methods for convolutional neural network models can be broadly categorized into post-training quantization (PTQ) and quantization aware training (QAT). While PTQ offers the advantage of requiring only a small portion of the data for quantization, the resulting quantized model may not be as effective as QAT. To address this limitation, this paper proposes a novel quantization function named Attention Round. Unlike traditional quantization function that map 32 bit floating-point value w to nearby quantization levels, Attention Round allows w to be mapped to all possible quantization levels in the entire quantization space, expanding the quantization optimization space. The possibilities of mapping w to different quantization levels are inversely correlated with the distance between w and the quantization levels, regulated by a Gaussian decay function. Furthermore, to tackle the challenge of mixed precision quantization, this paper introduces a lossy coding length measure to assign quantization precision to different layers of the model, eliminating the need for solving a combinatorial optimization problem. Experimental evaluations on various models demonstrate the effectiveness of the proposed method. Notably, for ResNet18 and MobileNetV2, the PTQ approach achieves comparable quantization performance to QAT while utilizing only 1024 training data and 10 min for the quantization process. • Attention Round quantization function expands the quantization optimization space. • Mixed precision allocation method improves mixed precision quantization efficiency. • Enriched lightweight CNNs contribute to applications in resource-limited scenarios. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09252312
Volume :
565
Database :
Academic Search Index
Journal :
Neurocomputing
Publication Type :
Academic Journal
Accession number :
173807898
Full Text :
https://doi.org/10.1016/j.neucom.2023.127012