Jorge Ejarque, Rosa M. Badia, Loïc Albertin, Giovanni Aloisio, Enrico Baglione, Yolanda Becerra, Stefan Boschert, Julian R. Berlin, Alessandro D’Anca, Donatello Elia, François Exertier, Sandro Fiore, José Flich, Arnau Folch, Steven J. Gibbons, Nikolay Koldunov, Francesc Lordan, Stefano Lorito, Finn Løvholt, Jorge Macías, Fabrizio Marozzo, Alberto Michelini, Marisol Monterrubio-Velasco, Marta Pienkowska, Josep de la Puente, Anna Queralt, Enrique S. Quintana-Ortí, Juan E. Rodríguez, Fabrizio Romano, Riccardo Rossi, Jedrzej Rybicki, Miroslaw Kupczyk, Jacopo Selva, Domenico Talia, Roberto Tonini, Paolo Trunfio, Manuela Volpe, Ejarque, Jorge, Badia, Rosa M., Albertin, Loïc, Aloisio, Giovanni, Baglione, Enrico, Becerra, Yolanda, Boschert, Stefan, Berlin, Julian R., D'Anca, Alessandro, Elia, Donatello, Exertier, Fran(c(c))oi, Fiore, Sandro, Flich, Jos('(e)), Folch, Arnau, Gibbons, Steven J., Koldunov, Nikolay, Lordan, Francesc, Lorito, Stefano, L(o)vholt, Finn, Mac('(i))as, Jorge, Marozzo, Fabrizio, Michelini, Alberto, Monterrubio-Velasco, Marisol, Pienkowska, Marta, de la Puente, Josep, Queralt, Anna, Quintana-Ort('(i)), Enrique S., Rodr('(i))guez, Juan E., Romano, Fabrizio, Rossi, Riccardo, Rybicki, Jedrzej, Kupczyk, Miroslaw, Selva, Jacopo, Talia, Domenico, Tonini, Roberto, Trunfio, Paolo, Volpe, Manuela, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació, Universitat Politècnica de Catalunya. Departament d'Enginyeria Civil i Ambiental, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering, Universitat Politècnica de Catalunya. RMEE - Grup de Resistència de Materials i Estructures en l'Enginyeria, European Commission, Ministerio de Ciencia, Innovación y Universidades (España), Agencia Estatal de Investigación (España), Federal Ministry of Education and Research (Germany), Ministero dello Sviluppo Economico, Norwegian Research Council, Generalitat de Catalunya, and Ministerio de Ciencia e Innovación (España)
The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena, current needs require in addition data analytics (DA) and artificial intelligence (AI) tasks. However, the development of these workflows is hampered by the lack of proper programming models and environments that support the integration of HPC, DA, and AI, as well as the lack of tools to easily deploy and execute the workflows in HPC systems. To progress in this direction, this paper presents use cases where complex workflows are required and investigates the main issues to be addressed for the HPC/DA/AI convergence. Based on this study, the paper identifies the challenges of a new workflow platform to manage complex workflows. Finally, it proposes a development approach for such a workflow platform addressing these challenges in two directions: first, by defining a software stack that provides the functionalities to manage these complex workflows; and second, by proposing the HPC Workflow as a Service (HPCWaaS) paradigm, which leverages the software stack to facilitate the reusability of complex workflows in federated HPC infrastructures. Proposals presented in this work are subject to study and development as part of the EuroHPC eFlows4HPC project., This work has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Spain, Germany, France, Italy, Poland, Switzerland and Norway. In Spain, it has received complementary funding from MCIN/AEI/10.13039/501100011033, Spain and the European Union NextGenerationEU/PRTR (contracts PCI2021-121957, PCI2021-121931, PCI2021-121944, and PCI2021-121927). In Germany, it has received complementary funding from the German Federal Ministry of Education and Research (contracts 16HPC016K, 6GPC016K, 16HPC017 and 16HPC018). In France, it has received financial support from Caisse des dépôts et consignations (CDC) under the action PIA ADEIP (project Calculateurs). In Italy, it has been preliminary approved for complimentary funding by Ministero dello Sviluppo Economico (MiSE) (ref. project prop. 2659). In Norway, it has received complementary funding from the Norwegian Research Council, Norway under project number 323825. In Switzerland, it has been preliminary approved for complimentary funding by the State Secretariat for Education, Research, and Innovation (SERI), Norway. In Poland, it is partially supported by the National Centre for Research and Development under decision DWM/EuroHPCJU/4/2021. The authors also acknowledge financial support by MCIN/AEI /10.13039/501100011033, Spain through the “Severo Ochoa Programme for Centres of Excellence in R&D” under Grant CEX2018-000797-S, the Spanish Government, Spain (contract PID2019-107255 GB) and by Generalitat de Catalunya, Spain (contract 2017-SGR-01414). Anna Queralt is a Serra Húnter Fellow., With funding from the Spanish government through the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2018-000797-S).