Back to Search Start Over

Failure prediction of tasks in the cloud at an earlier stage: a solution based on domain information mining.

Authors :
Liu, Chunhong
Dai, Liping
Lai, Yi
Lai, Guibing
Mao, Wentao
Source :
Computing. Sep2020, Vol. 102 Issue 9, p2001-2023. 23p.
Publication Year :
2020

Abstract

In a large-scale data center, it is vital to precisely recognize the termination statuses of applications at an early stage. In recent years, many machine learning techniques have been applied to this issue, which is beneficial for optimizing the scheduling policy and improving the efficiency of resource utilization. However, if the application's dynamic information is insufficient at the early stage, the generalization performance of the machine learning model will be lessened, and the prediction accuracy could be low. To overcome this problem, a novel failure prediction method that is based on the association relationships between similar jobs is proposed in this paper to jointly predict task's termination statuses at an earlier stage. The similar jobs whose tasks have similar changing modes of consumed resources, an inherent structural correlation may exist, and the correlation information is significant for improving the prediction model's generalization performance. First, a job clustering algorithm is proposed for identifying the jobs with higher similarity from jobs that have various numbers of tasks. Second, based on the job clustering results, the robust multi-task learning algorithm is introduced to effectively utilize the domain information among jobs (i.e. interactional relationship among jobs on the termination statuses of task). Experiments are conducted on a Google cluster workload traces dataset. The results show that the proposed method can realize higher prediction accuracy, lower misjudgment rate, and higher predictive stability than several state-of-the-art methods at 1/3 the running time of the tasks. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0010485X
Volume :
102
Issue :
9
Database :
Academic Search Index
Journal :
Computing
Publication Type :
Academic Journal
Accession number :
145347447
Full Text :
https://doi.org/10.1007/s00607-020-00800-1