51. Towards Real-time CNN Inference from a Video Stream on a Mobile GPU (WiP Paper)
- Author
-
Do-Hee Kim, Gunju Park, Sumin Kim, Youngmin Yi, and Chanyoung Oh
- Subjects
010302 applied physics ,Speedup ,Computer science ,business.industry ,Deep learning ,Inference ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Computer engineering ,Kernel (statistics) ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Artificial intelligence ,Quantization (image processing) ,Face detection ,business ,Execution model - Abstract
While there are several frameworks for CNN inference on mobile GPUs, they do not achieve real-time processing for the most of the CNNs that aim at reasonable accuracy since they all employ kernel-by-kernel execution model and do not effectively support INT8 quantization yet. In this paper, we reveal that mobile GPUs suffer from large kernel launch overhead unlike server GPUs, and then propose an on-device deep learning inference framework that can achieve real-time inference of CNNs on mobile GPUs by removing kernel launch overhead and by effectively exploiting INT8 quantization. We have evaluated the proposed framework with a state-of-the-art CNN based face detector (RetinaFace), and observed up to 2.01X of speedup compared to ARM Compute Library (ACL) on a commodity smartphone.
- Published
- 2020