Back to Search
Start Over
Failure prediction of tasks in the cloud at an earlier stage: a solution based on domain information mining.
- Source :
-
Computing . Sep2020, Vol. 102 Issue 9, p2001-2023. 23p. - Publication Year :
- 2020
-
Abstract
- In a large-scale data center, it is vital to precisely recognize the termination statuses of applications at an early stage. In recent years, many machine learning techniques have been applied to this issue, which is beneficial for optimizing the scheduling policy and improving the efficiency of resource utilization. However, if the application's dynamic information is insufficient at the early stage, the generalization performance of the machine learning model will be lessened, and the prediction accuracy could be low. To overcome this problem, a novel failure prediction method that is based on the association relationships between similar jobs is proposed in this paper to jointly predict task's termination statuses at an earlier stage. The similar jobs whose tasks have similar changing modes of consumed resources, an inherent structural correlation may exist, and the correlation information is significant for improving the prediction model's generalization performance. First, a job clustering algorithm is proposed for identifying the jobs with higher similarity from jobs that have various numbers of tasks. Second, based on the job clustering results, the robust multi-task learning algorithm is introduced to effectively utilize the domain information among jobs (i.e. interactional relationship among jobs on the termination statuses of task). Experiments are conducted on a Google cluster workload traces dataset. The results show that the proposed method can realize higher prediction accuracy, lower misjudgment rate, and higher predictive stability than several state-of-the-art methods at 1/3 the running time of the tasks. [ABSTRACT FROM AUTHOR]
- Subjects :
- *FORECASTING
*MACHINE performance
*ALGORITHMS
*TASKS
*MACHINE learning
Subjects
Details
- Language :
- English
- ISSN :
- 0010485X
- Volume :
- 102
- Issue :
- 9
- Database :
- Academic Search Index
- Journal :
- Computing
- Publication Type :
- Academic Journal
- Accession number :
- 145347447
- Full Text :
- https://doi.org/10.1007/s00607-020-00800-1