1. Reinforcement Learning based Fragment-Aware Scheduling for High Utilization HPC Platforms
- Author
-
Yen-Ling Chang, Lung-Pin Chen, and I-Chen Wu
- Subjects
020203 distributed computing ,Artificial neural network ,business.industry ,Computer science ,Distributed computing ,High capacity ,Usability ,0102 computer and information sciences ,02 engineering and technology ,Supercomputer ,01 natural sciences ,Scheduling (computing) ,Idle ,010201 computation theory & mathematics ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,Granularity ,business - Abstract
Due to high capacity and complex scheduling activities, a HPC platform often creates resource fragments with low usability. This paper develops a novel fragment-aware scheduling approach which improves system utilization by fitting elastic lightweight tasks to the fragments of resources dynamically. The new approach employs a threshold to determine the balancing factor between the length of tasks and the degree of granularity of the resource fragments. We employ the PPO reinforcement learning approach to train a neural network that can compute the threshold precisely. With the threshold that is adaptive to the changing system states, the PPO-based scheduler is able to utilize the idle resources and maximize the execution success rate of the tasks.
- Published
- 2019