Back to Search
Start Over
4.6-Bit Quantization for Fast and Accurate Neural Network Inference on CPUs.
- Source :
-
Mathematics (2227-7390) . Mar2024, Vol. 12 Issue 5, p651. 22p. - Publication Year :
- 2024
-
Abstract
- Quantization is a widespread method for reducing the inference time of neural networks on mobile Central Processing Units (CPUs). Eight-bit quantized networks demonstrate similarly high quality as full precision models and perfectly fit the hardware architecture with one-byte coefficients and thirty-two-bit dot product accumulators. Lower precision quantizations usually suffer from noticeable quality loss and require specific computational algorithms to outperform eight-bit quantization. In this paper, we propose a novel 4.6-bit quantization scheme that allows for more efficient use of CPU resources. This scheme has more quantization bins than four-bit quantization and is more accurate while preserving the computational efficiency of the later (it runs only 4% slower). Our multiplication uses a combination of 16- and 32-bit accumulators and avoids multiplication depth limitation, which the previous 4-bit multiplication algorithm had. The experiments with different convolutional neural networks on CIFAR-10 and ImageNet datasets show that 4.6-bit quantized networks are 1.5–1.6 times faster than eight-bit networks on the ARMv8 CPU. Regarding the quality, the results of the 4.6-bit quantized network are close to the mean of four-bit and eight-bit networks of the same architecture. Therefore, 4.6-bit quantization may serve as an intermediate solution between fast and inaccurate low-bit network quantizations and accurate but relatively slow eight-bit ones. [ABSTRACT FROM AUTHOR]
- Subjects :
- *CONVOLUTIONAL neural networks
*CENTRAL processing units
*MULTIPLICATION
Subjects
Details
- Language :
- English
- ISSN :
- 22277390
- Volume :
- 12
- Issue :
- 5
- Database :
- Academic Search Index
- Journal :
- Mathematics (2227-7390)
- Publication Type :
- Academic Journal
- Accession number :
- 175987325
- Full Text :
- https://doi.org/10.3390/math12050651