Back to Search
Start Over
SPM: Modeling Spark Task Execution Time from the Sub-stage Perspective
- Source :
- Algorithms and Architectures for Parallel Processing ISBN: 9783030389604, ICA3PP (2)
- Publication Year :
- 2020
- Publisher :
- Springer International Publishing, 2020.
-
Abstract
- Tasks are the basic unit of Spark application scheduling, and its execution is affected by various configurations of Spark cluster. Therefore, the prediction of task execution time is a challenging job. In this paper, we analyze the features of task execution procedure on different stages, and propose the method of prediction of each sub-stage execution time. Moreover, the correlative time overheads of GC and shuffle spill are analyzed in detail. As a result, we propose SPM, a task-level execution time prediction model. SPM can be used to predict the task execution time of each stage according to the input data size and configuration of parallelism. We further apply SPM to the Spark network emulation tool SNemu, which can determine the start time of each shuffle procedure for emulation effectively. Experimental results show that the prediction method can achieve high accuracy in a variety of Spark benchmarks on Hibench.
- Subjects :
- Emulation
Computer science
05 social sciences
Perspective (graphical)
050801 communication & media studies
Parallel computing
Network emulation
Cluster (spacecraft)
Task (computing)
0508 media and communications
0502 economics and business
Spark (mathematics)
Parallelism (grammar)
050211 marketing
Stage (hydrology)
Subjects
Details
- ISBN :
- 978-3-030-38960-4
- ISBNs :
- 9783030389604
- Database :
- OpenAIRE
- Journal :
- Algorithms and Architectures for Parallel Processing ISBN: 9783030389604, ICA3PP (2)
- Accession number :
- edsair.doi...........4f2999a72b25d610b4f0dc298fdbb8a7