Back to Search
Start Over
Regularization-Free Structural Pruning for GPU Inference Acceleration
- Source :
- ISQED
- Publication Year :
- 2021
- Publisher :
- IEEE, 2021.
-
Abstract
- Pruning is recently prevalent in deep neural network compression to save memory footprint and accelerate network inference. Unstructured pruning, i.e., fine-grained pruning, helps preserve model accuracy, while structural pruning, i.e., coarse-grained pruning, is preferred for general-purpose platforms such as GPUs. This paper proposes a regularization-free structural pruning scheme to take advantage of both unstructured and structural pruning by heuristically mixing vector-wise fine-grained and block-wise coarse-grained pruning masks with an AND operation. Experimental results demonstrate that the proposal can achieve higher model accuracy and higher sparsity ratio of VGG-16 on CIFAR-10 and CIFAR-100 compared with commonly applied block and balanced sparsity.
Details
- Database :
- OpenAIRE
- Journal :
- 2021 22nd International Symposium on Quality Electronic Design (ISQED)
- Accession number :
- edsair.doi...........f12629186d390826bf6f0499fc3a6a91
- Full Text :
- https://doi.org/10.1109/isqed51717.2021.9424299