1. Mixtran: an efficient and fair scheduler for mixed deep learning workloads in heterogeneous GPU environments.
- Author
-
Zhang, Xiao
- Subjects
DEEP learning ,ARTIFICIAL intelligence ,GLOBAL optimization ,JOB performance ,PRICES - Abstract
Deep learning (LeCun et al. in Nature 521(7553):436–444, 2015) workloads are common in today's production clusters due to the proliferation of deep learning driven AI (Russell and Norvig, Artificial intelligence: a modern approach, Pearson Education, Upper Saddle River, 2003) services. As deep learning workloads and GPUs all grow more heterogeneous, efficient and fair resource scheduling are the key to the maximal performance of a deep learning cluster. However, existing cluster schedulers are largely not tailored to deep learning jobs, and typically specifying a fixed amount of resources for each job, prohibiting high resource efficiency and job performance. Thus we reconsider the problem in this paper, and to resolve them we proposes MixTran, which works fairly and efficiently for mixed deep learning workloads on heterogeneous GPU clusters. At first, MixTran abstracts heterogeneous GPU resources and distributes them fairly to users in the form of virtual tickets. Then, MixTran transfers uniform requests into a global optimization model, which can make efficient use of GPUs and satisfy the quantified resource requests, heterogeneous node constraints and user fairness constraints. Finally, MixTran conducts greedy resource trading according to the second trader price, which can benefit both users. In the evaluation, we prove MixTran can greatly decrease total execution time of deep learning workloads than traditional scheduler up to 30%–50% and maintain the fairness of multiple users. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF