Back to Search
Start Over
Accelerating End-to-End Deep Learning Workflow With Codesign of Data Preprocessing and Scheduling.
- Source :
- IEEE Transactions on Parallel & Distributed Systems; Jul2021, Vol. 32 Issue 7, p1802-1814, 13p
- Publication Year :
- 2021
-
Abstract
- In this article, we investigate the performance bottleneck of existing deep learning (DL) systems and propose DLBooster to improve the running efficiency of deploying DL applications on GPU clusters. At its core, DLBooster leverages two-level optimizations to boost the end-to-end DL workflow. On the one hand, DLBooster selectively offloads some key decoding workloads to FPGAs to provide high-performance online data preprocessing services to the computing engine. On the other hand, DLBooster reorganizes the computational workloads of training neural networks with the backpropagation algorithm and schedules them according to their dependencies to improve the utilization of GPUs at runtime. Based on our experiments, we demonstrate that compared with baselines, DLBooster can improve the image processing throughput by 1.4× – 2.5× and reduce the processing latency by 1/3 in several real-world DL applications and datasets. Moreover, DLBooster consumes less than 1 CPU core to manage FPGA devices at runtime, which is at least 90 percent less than the baselines in some cases. DLBooster shows its potential to accelerate DL workflows in the cloud. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 10459219
- Volume :
- 32
- Issue :
- 7
- Database :
- Complementary Index
- Journal :
- IEEE Transactions on Parallel & Distributed Systems
- Publication Type :
- Academic Journal
- Accession number :
- 148970919
- Full Text :
- https://doi.org/10.1109/TPDS.2020.3047966