1. Evaluating the impact of task aggregation in workflows with shared resources environments
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Acosta Cobos, Mario César, Utrera Iglesias, Gladys Miriam, Castrillo, Miguel, Giménez de Castro Marciani, Manuel, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Acosta Cobos, Mario César, Utrera Iglesias, Gladys Miriam, Castrillo, Miguel, and Giménez de Castro Marciani, Manuel
- Abstract
We study the relative impact of task aggregation, or wrapping, which is a technique meant for computational workflows that bundles jobs into a single submission to be sent to remote schedulers. Experiments inside the Earth Science community can be lengthy and compriseseveral steps with many dependencies. The community has traditionally focused in increasing the performance of the models, but the overall execution of the workflow, including the queue time, has received little interest. Aiming to reduce the time spent in queue, the developers of Autosubmit, a workflow manager developed for climate simulations, weather forecast simulations, and air quality simulations, came up with task aggregating, or wrapping. Our objective is to assess if this technique does indeed reduce the total queue time of the workflow. The complex interplay between the dynamic nature of the usage of the machine and the scheduler policy plays a central role in our analysis, which poses the main challenge of this work. Hence, we do an intricate study of the scheduling policy of the popular Slurm Workload Manager and a statistical characterization of the usage of both simulated machines: LUMI and cea-curie. With that, we perform a twofold experimentation: a simulation using dynamic workloads - where job arrival time plays a role - with a workflow composed of multiple jobs and a static workload - where all jobs in the workload are submitted at the same time - varying job and user factors that play a role into the scheduling. Results show that aggregation is beneficial in the majority of cases for the workflows that are vertically organized - that is, a chain of submissions where each job is dependent on the previous -, whilst for the horizontal arranged workflows - where jobs do not have dependencies - it might undermine the queue time depending on the user's past usage and the machine's current state.
- Published
- 2023