1. Melissa: coordinating large-scale ensemble runs for deep learning and sensitivity analyses
- Author
-
Schouler, Marc, Caulk, Robert Alexander, Meyer, Lucas, Terraz, Théophile, Conrads, Christoph, Friedemann, Sebastian, Agarwal, Achal, Baldonado, Juan Manuel, Pogodziński, Bartłomiej, Sekuła, Anna, Ribes, Alejandro, Raffin, Bruno, Data Aware Large Scale Computing (DATAMOVE ), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), EDF R&D (EDF R&D), EDF (EDF), Saclay Industrial Lab for Artificial Intelligence Research (SINCLAIR AI Lab), THALES [France]-TOTAL FINA ELF-EDF (EDF), Institute of Bioorganic Chemistry [Poznań], Polska Akademia Nauk = Polish Academy of Sciences (PAN), European Project: 956560,REGALE, and European Project: 824158,H2020-EU.1.4. - EXCELLENT SCIENCE - Research Infrastructures ,EoCoE-II(2019)
- Subjects
distributed systems ,[PHYS]Physics [physics] ,[STAT]Statistics [stat] ,Supercomputing ,Surrogate Modeling ,Deep learning ,Orchestration Framework ,[INFO]Computer Science [cs] ,Sensitivity analysis - Abstract
International audience; Large-scale ensemble runs typically consist of executing thousands of physical simulation instances according to a range of different input parameters. These ensemble runs enable sensitivity analyses, deep surrogate trainings, reinforcement learning, and data assimilation, but they rely on volumes of data that are too large to store. For example, a recent data assimilation ensemble study generated 1.3 PB of data [@yashiro2020]. These enormous volumes of data hinder scientific analyses in two ways: first, the I/O speeds are the slowest component in supercomputers; the incongruence between slow read/write speeds compared to the rapid generation of data leads to a degradation and plateau of performance. Second, the file systems on supercomputers are not designed to allocate such large volumes of data to singular studies. To avoid this I/O limitation, scientists reduce their study size by running low resolution simulations or down-sampling output data in space and time. However, the I/O problem only becomes more pronounced as the speed and size of supercomputers continues to advance faster than I/O speeds of storage disks.
- Published
- 2023
- Full Text
- View/download PDF