Regularization-Free Structural Pruning for GPU Inference Acceleration

Authors :: Yanbing Yang
Cheng Zhuo
Chuliang Guo
Xunzhao Yin
He Li
Li Zhang
Keyu Long
Shaodi Wang
Source :: ISQED
Publication Year :: 2021
Publisher :: IEEE, 2021.
Abstract: Pruning is recently prevalent in deep neural network compression to save memory footprint and accelerate network inference. Unstructured pruning, i.e., fine-grained pruning, helps preserve model accuracy, while structural pruning, i.e., coarse-grained pruning, is preferred for general-purpose platforms such as GPUs. This paper proposes a regularization-free structural pruning scheme to take advantage of both unstructured and structural pruning by heuristically mixing vector-wise fine-grained and block-wise coarse-grained pruning masks with an AND operation. Experimental results demonstrate that the proposal can achieve higher model accuracy and higher sparsity ratio of VGG-16 on CIFAR-10 and CIFAR-100 compared with commonly applied block and balanced sparsity.

Subjects :: Scheme (programming language)
Acceleration
Artificial neural network
Computer science
Memory footprint
Inference
Pruning (decision trees)
Regularization (mathematics)
computer
Algorithm
Computer Science::Databases
computer.programming_language
Block (data storage)

Database :: OpenAIRE
Journal :: 2021 22nd International Symposium on Quality Electronic Design (ISQED)
Accession number :: edsair.doi...........f12629186d390826bf6f0499fc3a6a91
Full Text :: https://doi.org/10.1109/isqed51717.2021.9424299