Start Over

Characterizing Parameter Scaling with Quantization for Deployment of CNNs on Real-Time Systems.

Authors :: Gealy, Calvin B.
George, Alan D.
Source :: ACM Transactions on Embedded Computing Systems; May2024, Vol. 23 Issue 3, p1-35, 35p
Publication Year :: 2024
Abstract: Modern deep-learning models tend to include billions of parameters, reducing real-time performance. Embedded systems are compute-constrained while frequently used to deploy these models for real-time systems given size, weight, and power requirements. Tools like parameter-scaling methods help to shrink models to ease deployment. This research compares two scaling methods for convolutional neural networks, uniform scaling and NeuralScale, and analyzes their impact on inference latency, memory utilization, and power. Uniform scaling scales the number of filters evenly across a network. NeuralScale adaptively scales the model to theoretically achieve the highest accuracy for a target parameter count. In this study, VGG-11, MobileNetV2, and ResNet-50 models were scaled to four ratios: 0.25×, 0.50×, 0.75×, 1.00×. These models were benchmarked on an ARM Cortex-A72 CPU, an NVIDIA Jetson AGX Xavier GPU, and a Xilinx ZCU104 FPGA. Additionally, quantization was applied to meet real-time objectives. The CIFAR-10 and tinyImageNet datasets were studied. On CIFAR-10, NeuralScale creates more computationally intensive models than uniform scaling for the same parameter count, with relative speeds of 41% on the CPU, 72% on the GPU, and 96% on the FPGA. The additional computational complexity is a tradeoff for accuracy improvements in VGG-11 and MobileNetV2 NeuralScale models but reduced ResNet-50 NeuralScale accuracy. Furthermore, quantization alone achieves similar or better performance on the CPU and GPU devices when compared with models scaled to 0.50×, despite slight reductions in accuracy. On the GPU, quantization reduces latency by 2.7× and memory consumption by 4.3×. Uniform-scaling models are 1.8× faster and use 2.8× less memory. NeuralScale reduces latency by 1.3× and dropped memory by 1.1×. We find quantization to be a practical first tool for improved performance. Uniform scaling can easily be applied for additional improvements. NeuralScale may improve accuracy but tends to negatively impact performance, so more care must be taken with it. [ABSTRACT FROM AUTHOR]

Subjects :: CONVOLUTIONAL neural networks
MODELS & modelmaking
COMPUTATIONAL complexity

Details

Language :: English
ISSN :: 15399087
Volume :: 23
Issue :: 3
Database :: Complementary Index
Journal :: ACM Transactions on Embedded Computing Systems
Publication Type :: Academic Journal
Accession number :: 177184159
Full Text :: https://doi.org/10.1145/3654799

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Characterizing Parameter Scaling with Quantization for Deployment of CNNs on Real-Time Systems.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Characterizing Parameter Scaling with Quantization for Deployment of CNNs on Real-Time Systems.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources