1. Towards A Robust Meta-Reinforcement Learning-Based Scheduling Framework for Time Critical Tasks in Cloud Environments
- Author
-
Liu, H., Chen, P., Zhao, Z., Ardagna, C.A., Chang, C., Daminai, E., Ranjan, R., Wang, Z., Ward, R., Zhang, J., Zhang, W., and Multiscale Networked Systems (IvI, FNWI)
- Subjects
reinforcement learning ,task scheduling ,Meta learning (computer science) ,Computer science ,business.industry ,meta learning ,Distributed computing ,Cloud computing ,robustness ,Dynamic priority scheduling ,Scheduling (computing) ,Task (computing) ,Robustness (computer science) ,Container (abstract data type) ,Reinforcement learning ,resource management ,business - Abstract
Container clusters play an increasingly important role in cloud computing for processing dynamic computing tasks. The resource manager (i.e., orchestrater) of the cluster automates the scheduling of the dynamic requests, effectively manages the resources’ utilization across distributing infrastructure resources. For many applications, the requests to the cluster are often with restricted deadlines. The scheduling of container clusters is often tricky, especially when the cluster’s size is large and the load of the requests is dynamically changing. Machine learningbased approaches such as reinforcement learning have attracted lots of research attention during the past years; However, those approaches suffer from low robustness when the requests in an operational environment are changing and different from the training data sets. This paper investigates this problem by quantifying the robustness and proposing meta-gradient reinforcement learning to improve the robustness of classical reinforcement learning-based approaches. The proposed approach can lead to better deadline guarantees and faster adaptation for timecritical task scheduling under dynamic environments. We then empirically test the benefits of our method using both real-world and synthetic data sets. The evaluation results show that the proposed method outperforms the compared RL methods in scheduling performance and robustness.
- Published
- 2021
- Full Text
- View/download PDF