Back to Search Start Over

SPM: Modeling Spark Task Execution Time from the Sub-stage Perspective

Authors :
Li Wei
Hu Shengjie
Di Wang
Yunchun Li
Chen Tianba
Source :
Algorithms and Architectures for Parallel Processing ISBN: 9783030389604, ICA3PP (2)
Publication Year :
2020
Publisher :
Springer International Publishing, 2020.

Abstract

Tasks are the basic unit of Spark application scheduling, and its execution is affected by various configurations of Spark cluster. Therefore, the prediction of task execution time is a challenging job. In this paper, we analyze the features of task execution procedure on different stages, and propose the method of prediction of each sub-stage execution time. Moreover, the correlative time overheads of GC and shuffle spill are analyzed in detail. As a result, we propose SPM, a task-level execution time prediction model. SPM can be used to predict the task execution time of each stage according to the input data size and configuration of parallelism. We further apply SPM to the Spark network emulation tool SNemu, which can determine the start time of each shuffle procedure for emulation effectively. Experimental results show that the prediction method can achieve high accuracy in a variety of Spark benchmarks on Hibench.

Details

ISBN :
978-3-030-38960-4
ISBNs :
9783030389604
Database :
OpenAIRE
Journal :
Algorithms and Architectures for Parallel Processing ISBN: 9783030389604, ICA3PP (2)
Accession number :
edsair.doi...........4f2999a72b25d610b4f0dc298fdbb8a7