Back to Search Start Over

Regularization-Free Structural Pruning for GPU Inference Acceleration

Authors :
Yanbing Yang
Cheng Zhuo
Chuliang Guo
Xunzhao Yin
He Li
Li Zhang
Keyu Long
Shaodi Wang
Source :
ISQED
Publication Year :
2021
Publisher :
IEEE, 2021.

Abstract

Pruning is recently prevalent in deep neural network compression to save memory footprint and accelerate network inference. Unstructured pruning, i.e., fine-grained pruning, helps preserve model accuracy, while structural pruning, i.e., coarse-grained pruning, is preferred for general-purpose platforms such as GPUs. This paper proposes a regularization-free structural pruning scheme to take advantage of both unstructured and structural pruning by heuristically mixing vector-wise fine-grained and block-wise coarse-grained pruning masks with an AND operation. Experimental results demonstrate that the proposal can achieve higher model accuracy and higher sparsity ratio of VGG-16 on CIFAR-10 and CIFAR-100 compared with commonly applied block and balanced sparsity.

Details

Database :
OpenAIRE
Journal :
2021 22nd International Symposium on Quality Electronic Design (ISQED)
Accession number :
edsair.doi...........f12629186d390826bf6f0499fc3a6a91
Full Text :
https://doi.org/10.1109/isqed51717.2021.9424299