Back to Search Start Over

Dynamic spawning of MPI processes applied to malleability

Authors :
Iker Martín-Álvarez
José I Aliaga
Maribel Castillo
Sergio Iserte
Rafael Mayo
Barcelona Supercomputing Center
Source :
The International Journal of High Performance Computing Applications. :109434202311765
Publication Year :
2023
Publisher :
SAGE Publications, 2023.

Abstract

Malleability allows computing facilities to adapt their workloads through resource management systems to maximize the throughput of the facility and the efficiency of the executed jobs. This technique is based on reconfiguring a job to a different resource amount during execution and then continuing with it. One of the stages of malleability is the dynamic spawning of processes in execution time, where different decisions in this stage will affect how the next stage of data redistribution is performed, which is the most time-consuming stage. This paper describes different methods and strategies, defining eight different alternatives to spawn processes dynamically and indicates which one should be used depending on whether a strong or weak scaling application is being used. In addition, it is described for both types of applications which strategies benefit most the application performance or the system productivity. The results show that reducing the number of spawning processes by reusing the older ones can reduce reconfiguration time compared to the classical method by up to 2.6 times for expanding and up to 36 times for shrinking. Furthermore, the asynchronous strategy requires analysing the impact of oversubscription on application performance. This work has been funded by the following projects: project PID2020-113656RB-C21 supported by MCIN/AEI/10.13039/501100011033 and project UJI-B2019-36 supported by UniversitatJaume I. Researcher S. Iserte was supported by the postdoctoralfellowship APOSTD/2020/026, and researcher I. Martín- Álvarez was supported by the predoctoral fellowship ACIF/2021/260, both from Valencian Region Government and European Social Funds.

Details

ISSN :
17412846 and 10943420
Database :
OpenAIRE
Journal :
The International Journal of High Performance Computing Applications
Accession number :
edsair.doi.dedup.....2f9f1a93a02cb9c3c53658467ae84ef8
Full Text :
https://doi.org/10.1177/10943420231176527