1. Modeling Batch Tasks Using Recurrent Neural Networks in Co-Located Alibaba Workloads
- Author
-
Khalid, Hifza, Ramaswamy, Arunselvan, Ferlin, Simone, Couch, Alva, Khalid, Hifza, Ramaswamy, Arunselvan, Ferlin, Simone, and Couch, Alva
- Abstract
Accurate predictive models for cloud workloads can be helpful in improving task scheduling, capacity planning and preemptive resource conflict resolution, especially in the setting of co-located jobs. Alibaba, one of the leading cloud providers co-locates transient batch tasks and high priority latency sensitive online jobs on the same cluster. In this paper, we consider the problem of using a publicly released dataset by Alibaba to model the batch tasks that are often overlooked compared to online services. The dataset contains the arrivals and resource requirements (CPU, memory, etc.) for both batch and online tasks. Our trained model predicts, with high accuracy, the number of batch tasks that arrive in any 30 minute window, their associated CPU and memory requirements, and their lifetimes. It captures over 94% of arrivals in each 30 minute window within a 95% prediction interval. The F1 scores for the most frequent CPU classes exceed 75%, and our memory and lifetime predictions incur less than 1% test data loss. The prediction accuracy of the lifetime of a batch-task drops when the model uses both CPU and memory information, as opposed to only using memory information.
- Published
- 2024
- Full Text
- View/download PDF