1. A novel proactive Health Aware Fault Tolerant (HAFT) scheduler for computational grid based on resource failure data analytics.
- Author
-
Ebenezer, A. Shamila, Rajsingh, Elijah Blessing, and Kaliaperumal, Baskaran
- Subjects
GRID computing ,DISTRIBUTED computing ,PRODUCTION scheduling ,DATA analysis ,FAULT tolerance (Engineering) - Abstract
In a heterogeneous distributed computing environment, developing a fault tolerance mechanism is a key research issue. Most of the existing fault tolerance approaches for distributed computing environment are post-active. These post-active approaches, predominantly involve the heartbeat strategy for fault detection and the checkpointing mechanism for fault recovery. In this proposed work, a proactive Health Aware Fault Tolerant (HAFT) scheduler using the Cox Proportional Hazard survival probability model is developed. The survival probability of the resource is estimated using resource failure data analytics and termed as health coefficient of the resource. For the job distribution classes jclass1, jclass2, and jclass3, the average improvement for makespan in HAFT algorithm over the compared algorithms are 44, 59.6, and 26.4%. In a heterogeneous environment, the job failure rate of the HAFT scheduler is ranging between 15 and 20% and it is stable for all the three jclasses. In a homogenous environment, the job failure rate of HAFT algorithm in comparison to IRP, REP and MJSP algorithm is considerably reduced by 58.6, 26.4, and 11.6%, respectively. For a failure probability higher than 0.4, the resource efficiency of HAFT algorithm on an average is 26% more than MJSP and 53% more than REP. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF