Author: "Agullo, Emmanuel" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Agullo, Emmanuel"' showing total 229 results

Start Over Author "Agullo, Emmanuel"

229 results on '"Agullo, Emmanuel"'

1. Multifacets of lossy compression for scientific data in the Joint-Laboratory of Extreme Scale Computing

Author: Cappello, Franck, Acosta, Mario, Agullo, Emmanuel, Anzt, Hartwig, Calhoun, Jon, Di, Sheng, Giraud, Luc, Grützmacher, Thomas, Jin, Sian, Sano, Kentaro, Sato, Kento, Singh, Amarjit, Tao, Dingwen, Tian, Jiannan, Ueno, Tomohiro, Underwood, Robert, Vivien, Frédéric, Yepes, Xavier, Kazutomo, Yoshii, and Zhang, Boyuan
Published: 2025
Full Text: View/download PDF

2. Resiliency in Numerical Algorithm Design for Extreme Scale Simulations

Author: Agullo, Emmanuel, Altenbernd, Mirco, Anzt, Hartwig, Bautista-Gomez, Leonardo, Benacchio, Tommaso, Bonaventura, Luca, Bungartz, Hans-Joachim, Chatterjee, Sanjay, Ciorba, Florina M., DeBardeleben, Nathan, Drzisga, Daniel, Eibl, Sebastian, Engelmann, Christian, Gansterer, Wilfried N., Giraud, Luc, Goeddeke, Dominik, Heisig, Marco, Jezequel, Fabienne, Kohl, Nils, Li, Xiaoye Sherry, Lion, Romain, Mehl, Miriam, Mycek, Paul, Obersteiner, Michael, Quintana-Orti, Enrique S., Rizzi, Francesco, Ruede, Ulrich, Schulz, Martin, Fung, Fred, Speck, Robert, Stals, Linda, Teranishi, Keita, Thibault, Samuel, Thoennes, Dominik, Wagner, Andreas, and Wohlmuth, Barbara
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, D.4.5, G.4, G.1, D.4.4
Abstract: This work is based on the seminar titled ``Resiliency in Numerical Algorithm Design for Extreme Scale Simulations'' held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors. Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge., Comment: 45 pages, 3 figures, submitted to The International Journal of High Performance Computing Applications
Published: 2020

3. Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method

Author: Cools, Siegfried, Yetkin, Emrullah Fatih, Agullo, Emmanuel, Giraud, Luc, and Vanroose, Wim
Subjects: Mathematics - Numerical Analysis, 65F10, 65F50, 65G50, 65Y05, 65Y20
Abstract: Pipelined Krylov subspace methods typically offer improved strong scaling on parallel HPC hardware compared to standard Krylov subspace methods for large and sparse linear systems. In pipelined methods the traditional synchronization bottleneck is mitigated by overlapping time-consuming global communications with useful computations. However, to achieve this communication hiding strategy, pipelined methods introduce additional recurrence relations for a number of auxiliary variables that are required to update the approximate solution. This paper aims at studying the influence of local rounding errors that are introduced by the additional recurrences in the pipelined Conjugate Gradient method. Specifically, we analyze the impact of local round-off effects on the attainable accuracy of the pipelined CG algorithm and compare to the traditional CG method. Furthermore, we estimate the gap between the true residual and the recursively computed residual used in the algorithm. Based on this estimate we suggest an automated residual replacement strategy to reduce the loss of attainable accuracy on the final iterative solution. The resulting pipelined CG method with residual replacement improves the maximal attainable accuracy of pipelined CG, while maintaining the efficient parallel performance of the pipelined method. This conclusion is substantiated by numerical results for a variety of benchmark problems., Comment: 26 pages, 6 figures, 2 tables, 4 algorithms
Published: 2016

4. Multifacets of lossy compression for scientific data in the Joint-Laboratory of Extreme Scale Computing

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Cappello, Franck, Di, Sheng, Underwood, Robert, Tao, Dingwen, Calhoun, Jon, Kazutomo, Yoshii, Sato, Kento, Singh, Amarjit, Giraud, Luc, Agullo, Emmanuel, Yepes-Arbós, Xavier, Acosta Cobos, Mario César, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Cappello, Franck, Di, Sheng, Underwood, Robert, Tao, Dingwen, Calhoun, Jon, Kazutomo, Yoshii, Sato, Kento, Singh, Amarjit, Giraud, Luc, Agullo, Emmanuel, Yepes-Arbós, Xavier, and Acosta Cobos, Mario César
Abstract: The Joint Laboratory on Extreme-Scale Computing (JLESC) was initiated at the same time lossy compression for scientific data became an important topic for the scientific communities. The teams involved in the JLESC played and are still playing an important role in developing the research, techniques, methods, and technologies making lossy compression for scientific data a key tool for scientists and engineers. In this paper, we present the evolution of lossy compression for scientific data from 2015, describing the situation before the JLESC started, the evolution of this discipline in the past 8 years (until 2023) through the prism of the JLESC collaborations on this topic and some of the remaining open research questions., This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC , a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, to support the nation’s exascale computing imperative. The material was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357 , and supported by the National Science Foundation under Grant OAC-2003709/2303064 , OAC-2104023/2247080 , OAC-2311875/2311876/2311877 , OAC-2312673 , and OAC-2034169 . We acknowledge the computing resources provided on Bebop (operated by the Laboratory Computing Resource Center at Argonne). Some of the experiments presented in this paper were carried out using the PlaFRIM experimental testbed, supported by Inria, CNRS (LABRI and IMB), Université de Bordeaux, Bordeaux INP and Conseil Régional d’Aquitaine (see https://www.plafrim.fr ). TEZip - This work has been supported by the COE research grant in computational science from Hyogo Prefecture and Kobe City through the Foundation for Computational Science. XIOS-SZ - Mario Acosta and Xavier Yepes-Arbós have received co-funding from the State Research Agency through OEMES ( PID2020-116324RA-I00 )., Peer Reviewed, Article signat per 20 autors/es: Franck Cappello (a), Sheng Di (a), Robert Underwood (a), Dingwen Tao (b), Jon Calhoun (c), Yoshii Kazutomo (a), Kento Sato (d), Amarjit Singh (d), Luc Giraud (e), Emmanuel Agullo (e), Xavier Yepes (f), Mario Acosta (f), Sian Jin (b), Jiannan Tian (b), Frédéric Vivien (e), Boyuan Zhang (b), Kentaro Sano (d), Tomohiro Ueno (d), Thomas Grützmacher (g), Hartwig Anzt (g) / (a) Argonne National Laboratory, United States of America; (b) Indiana University, Bloomington, United States of America; (c) Clemson University, United States of America; (d) RIKEN Center for Computational Science, Japan; (e) National Research Institute for Computing and Automation, France; (f) Barcelona Supercomputing Center, Spain; (g) Karlsruhe Institute of Technology, Germany, Postprint (author's final draft)
Published: 2024

5. Pipelining the Fast Multipole Method over a Runtime System

Author: Agullo, Emmanuel, Bramas, Béranger, Coulaud, Olivier, Darve, Eric, Messner, Matthias, and Toru, Takahashi
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task flow and employing a state-of-the-art runtime system, StarPU, in order to process the tasks on the different processing units. We carefully design the task flow, the mathematical operators, their Central Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as well as scheduling schemes. We compute potentials and forces of 200 million particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38 million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem processor enhanced with 3 Nvidia M2090 Fermi GPUs., Comment: No. RR-7981 (2012)
Published: 2012

6. Fully Empirical Autotuned QR Factorization For Multicore Architectures

Author: Agullo, Emmanuel, Dongarra, Jack, Nath, Rajib, and Tomov, Stanimire
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures. We show that it is hard to rely on a model, which motivates us to design a fully empirical approach. We exhibit few strong empirical properties that enable us to efficiently prune the search space. Our method is automatic, fast and reliable. The tuning process is indeed fully performed at install time in less than one and ten minutes on five out of seven platforms. We achieve an average performance varying from 97% to 100% of the optimum performance depending on the platform. This work is a basis for autotuning the PLASMA library and enabling easy performance portability across hardware systems.
Published: 2011

7. Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures

Author: Agullo, Emmanuel, Bouwmeester, Henricus, Dongarra, Jack, Kurzak, Jakub, Langou, Julien, and Rosenberg, Lee
Subjects: Computer Science - Mathematical Software, Computer Science - Numerical Analysis
Abstract: The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research has shown that it is possible to write efficient and scalable tile algorithms for performing a Cholesky factorization, a (pseudo) LU factorization, and a QR factorization. In this extended abstract, we attack the problem of the computation of the inverse of a symmetric positive definite matrix. We observe that, using a dynamic task scheduler, it is relatively painless to translate existing LAPACK code to obtain a ready-to-be-executed tile algorithm. However we demonstrate that non trivial compiler techniques (array renaming, loop reversal and pipelining) need then to be applied to further increase the parallelism of our application. We present preliminary experimental results., Comment: 8 pages, extended abstract submitted to VecPar10 on 12/11/09, notification of acceptance received on 02/05/10. See: http://vecpar.fe.up.pt/2010/
Published: 2010

8. QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

Author: Agullo, Emmanuel, Coti, Camille, Dongarra, Jack, Herault, Thomas, and Langou, Julien
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Numerical Analysis
Abstract: Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLAPACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization -- one of the main dense linear algebra kernels -- of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's)., Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.)
Published: 2009
Full Text: View/download PDF

9. On the Autotuning of Task-Based Numerical Libraries for Heterogeneous Architectures

Author: Agullo, Emmanuel, primary, Cámara, Jesús, additional, Cuenca, Javier, additional, and Giménez, Domingo, additional
Published: 2020
Full Text: View/download PDF

10. Exploiting a Parametrized Task Graph Model for the Parallelization of a Sparse Direct Multifrontal Solver

Author: Agullo, Emmanuel, Bosilca, George, Buttari, Alfredo, Guermouche, Abdou, Lopez, Florent, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Desprez, Frédéric, editor, Dutot, Pierre-François, editor, Kaklamanis, Christos, editor, Marchal, Loris, editor, Molitorisz, Korbinian, editor, Ricci, Laura, editor, Scarano, Vittorio, editor, Vega-Rodríguez, Miguel A., editor, Varbanescu, Ana Lucia, editor, Hunold, Sascha, editor, Scott, Stephen L., editor, Lankes, Stefan, editor, and Weidendorfer, Josef, editor
Published: 2017
Full Text: View/download PDF

11. Task-Based Sparse Hybrid Linear Solver for Distributed Memory Heterogeneous Architectures

Author: Agullo, Emmanuel, Giraud, Luc, Nakov, Stojce, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Desprez, Frédéric, editor, Dutot, Pierre-François, editor, Kaklamanis, Christos, editor, Marchal, Loris, editor, Molitorisz, Korbinian, editor, Ricci, Laura, editor, Scarano, Vittorio, editor, Vega-Rodríguez, Miguel A., editor, Varbanescu, Ana Lucia, editor, Hunold, Sascha, editor, Scott, Stephen L., editor, Lankes, Stefan, editor, and Weidendorfer, Josef, editor
Published: 2017
Full Text: View/download PDF

12. Task-based Parallel Programming for Scalable Matrix Product Algorithms

Author: Agullo, Emmanuel, primary, Buttari, Alfredo, additional, Guermouche, Abdou, additional, Herrmann, Julien, additional, and Jego, Antoine, additional
Published: 2023
Full Text: View/download PDF

13. On the Arithmetic Intensity of Distributed-Memory Dense Matrix Multiplication Involving a Symmetric Input Matrix (SYMM)

Author: Agullo, Emmanuel, primary, Buttari, Alfredo, additional, Coulaud, Olivier, additional, Eyraud-Dubois, Lionel, additional, Faverge, Mathieu, additional, Franc, Alain, additional, Guermouche, Abdou, additional, Jego, Antoine, additional, Peressoni, Romain, additional, and Pruvost, Florent, additional
Published: 2023
Full Text: View/download PDF

14. On the Arithmetic Intensity of Distributed-Memory Dense Matrix Multiplication Involving a Symmetric Input Matrix (SYMM)

Author: Agullo, Emmanuel, Buttari, Alfredo, Coulaud, Olivier, Eyraud-Dubois, Lionel, Faverge, Mathieu, Franc, Alain, Guermouche, Abdou, Jego, Antoine, Peressoni, Romain, Pruvost, Florent, COmposabilité Numerique and parallèle pour le CAlcul haute performanCE (CONCACE), Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique (CERFACS)-Airbus [France]-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Algorithmes Parallèles et Optimisation (IRIT-APO), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Centre National de la Recherche Scientifique (CNRS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Outils et Optimisations pour le Calcul Haute Performance et l'Apprentissage (TOPAL), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut Polytechnique de Bordeaux (Bordeaux INP), Biodiversité, Gènes & Communautés (BioGeCo), Université de Bordeaux (UB)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), from patterns to models in computational biodiversity and biotechnology (PLEIADE), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Biodiversité, Gènes & Communautés (BioGeCo), Université de Bordeaux (UB)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Service Expérimentation et Développement [Bordeaux] (SED), Inria Bordeaux - Sud-Ouest, Région Nouvelle-Aquitaine (2018-1R50119 HPC scalable ecosystem), IEEE, and ANR-19-CE46-0009,SOLHARIS,Solveurs pour architectures hétérogènes utilisant des supports d'exécution, objectif scalabilité(2019)
Subjects: Matrix multiplication, Symmetric, GEMM, 2DBC, SYMM, task-based programming, TBC, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], 2.5D, SBC, 3D
Abstract: International audience; Dense matrix multiplication involving a symmetric input matrix (SYMM) is implemented in reference distributed-memory codes with the same data distribution as its general analogue (GEMM). We show that, when the symmetric matrix is dominant, such a 2D block-cyclic (2D BC) scheme leads to a lower arithmetic intensity (AI) of SYMM than that of GEMM by a factor of 2. We propose alternative data distributions preserving the memory benefit of SYMM of storing only half of the matrix while achieving up to the same AI as GEMM. We also show that, in the case we can afford the same memory footprint as GEMM, SYMM can achieve a higher AI. We propose a task-based design of SYMM independent of the data distribution. This design allows for scalable A-stationary SYMM with which all discussed data distributions, may they be very irregular, can be easily assessed. We have integrated the resulting code in a reduction dimension algorithm involving a randomized singular value decomposition dominated by SYMM. An experimental study shows a compelling impact on performance.
Published: 2023

15. Décomposition en valeurs singulières randomisée et positionnement multidimensionel à base de tâches

Author: Agullo, Emmanuel, Coulaud, Olivier, Denis, Alexandre, Faverge, Mathieu, Franc, Alain, Frigerio, Jean-Marc, Furmento, Nathalie, Guilbaud, Adrien, Jeannot, Emmanuel, Peressoni, Romain, Pruvost, Florent, Thibault, Samuel, COmposabilité Numerique and parallèle pour le CAlcul haute performanCE (CONCACE), Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique (CERFACS)-Airbus [France]-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Topology-Aware System-Scale Data Management for High-Performance Computing (TADAAM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Biodiversité, Gènes & Communautés (BioGeCo), Université de Bordeaux (UB)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), from patterns to models in computational biodiversity and biotechnology (PLEIADE), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Biodiversité, Gènes & Communautés (BioGeCo), Université de Bordeaux (UB)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), STatic Optimizations, Runtime Methods (STORM), Service Expérimentation et Développement [Bordeaux] (SED), Inria Bordeaux - Sud-Ouest, Projet Région Nouvelle-Aquitaine 2018-1R50119 'HPC scalable ecosystem', Inria Bordeaux - Sud Ouest, Inrae - BioGeCo, and Agullo, Emmanuel
Subjects: random projection, distributed memory, projection aléatoire, moteur d’exécution, runtime system, multidimensional scaling (MDS), positionnement multidimensionel, heterogeneous machine, programmation à base de tâches, machine hétérogène, décomposition en valeur singulière randomisée, randomized singular value decomposition (RSVD), AMS 15A18, 65F15, 68W20, 65Y05, 65-04, [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], task-based programming, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], mémoire distribuée
Abstract: The multidimensional scaling (MDS) is an important and robust algorithm for representing individual cases of a dataset out of their respective dissimilarities. However, heuristics, possibly trading-off with robustness, are often preferred in practice due to the potentially prohibitive memory and computational costs of the MDS. The recent introduction of random projection techniques within the MDS allowed it to be become competitive on larger testcases. The goal of this manuscript is to propose a high-performance distributed-memory MDS based on random projection for processing data sets of even larger size (up to one million items). We propose a task-based design of the whole algorithm and we implement it within an efficient software stack including state-of-the-art numerical solvers, runtime systems and communication layers. The outcome is the ability to efficiently apply robust MDS to large datasets on modern supercomputers. We assess the resulting algorithm and software stack to the point cloud visualization for analyzing distances between sequencesin metabarcoding., Le positionnement multidimensionnel (MDS) est un algorithme important et robuste pour représenter les cas individuels d’un ensemble de données en fonction de leurs dissimilarités respectives. Cependant, les heuristiques, qui peuvent être un compromis avec la robustesse, sont souvent préférées en pratique en raison de sa consommation mémoire et de ses coûts potentiellement prohibitifs. L’introduction récente de techniques de projection aléatoire dans le MDS lui a permis de devenir compétitif sur des cas test plus importants. L’objectif de ce manuscrit est de proposer un MDS haute performance basé sur la projection aléatoire pour le traitement d’ensembles de données de taille encore plus grande (jusqu’à un million d’éléments). Nous proposons une conception de l’algorithme et nous l’implémentons dans une pile logicielle efficace, comprenant des solveurs numériques de pointe ainsi des systèmes d’exécution et des couches de communication optimisés. L’aboutissement de ce travail résultat est la capacité d’appliquer efficacement le MDS robuste à de grands ensembles de données sur des super-ordinateurs modernes. Nous évaluons l’algorithme etla pile logicielle résultants à la visualisation de nuages de points pour l’analyse des distances entre séquences de metabarcoding.
Published: 2022

16. Multifrontal QR Factorization for Multicore Architectures over Runtime Systems

Author: Agullo, Emmanuel, Buttari, Alfredo, Guermouche, Abdou, Lopez, Florent, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Wolf, Felix, editor, Mohr, Bernd, editor, and an Mey, Dieter, editor
Published: 2013
Full Text: View/download PDF

17. A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures

Author: Agullo, Emmanuel, Dongarra, Jack, Nath, Rajib, Tomov, Stanimire, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Jeannot, Emmanuel, editor, Namyst, Raymond, editor, and Roman, Jean, editor
Published: 2011
Full Text: View/download PDF

18. Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures

Author: Agullo, Emmanuel, Bouwmeester, Henricus, Dongarra, Jack, Kurzak, Jakub, Langou, Julien, Rosenberg, Lee, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Palma, José M. Laginha M., editor, Daydé, Michel, editor, Marques, Osni, editor, and Lopes, João Correia, editor
Published: 2011
Full Text: View/download PDF

19. Study of the Processor and Memory Power and Energy Consumption of Coupled Sparse/Dense Solvers

Author: Agullo, Emmanuel, primary, Felsoci, Marek, additional, Guermouche, Amina, additional, Mathieu, Herve, additional, Sylvand, Guillaume, additional, and Tagliaro, Bastien, additional
Published: 2022
Full Text: View/download PDF

20. On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme

Author: Agullo, Emmanuel, Guermouche, Abdou, L’Excellent, Jean-Yves, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Palma, José M. Laginha M., editor, Amestoy, Patrick R., editor, Daydé, Michel, editor, Mattoso, Marta, editor, and Lopes, João Correia, editor
Published: 2008
Full Text: View/download PDF

21. Task-Based Sparse Hybrid Linear Solver for Distributed Memory Heterogeneous Architectures

Author: Agullo, Emmanuel, primary, Giraud, Luc, additional, and Nakov, Stojce, additional
Published: 2017
Full Text: View/download PDF

22. Exploiting a Parametrized Task Graph Model for the Parallelization of a Sparse Direct Multifrontal Solver

Author: Agullo, Emmanuel, primary, Bosilca, George, additional, Buttari, Alfredo, additional, Guermouche, Abdou, additional, and Lopez, Florent, additional
Published: 2017
Full Text: View/download PDF

23. Reducing the I/O Volume in an Out-of-Core Sparse Multifrontal Solver

Author: Agullo, Emmanuel, Guermouche, Abdou, L’Excellent, Jean-Yves, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Aluru, Srinivas, editor, Parashar, Manish, editor, Badrinath, Ramamurthy, editor, and Prasanna, Viktor K., editor
Published: 2007
Full Text: View/download PDF

24. A Preliminary Out-of-Core Extension of a Parallel Multifrontal Solver

Author: Agullo, Emmanuel, Guermouche, Abdou, L’Excellent, Jean-Yves, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Nagel, Wolfgang E., editor, Walter, Wolfgang V., editor, and Lehner, Wolfgang, editor
Published: 2006
Full Text: View/download PDF

25. Decentralized in-order execution of a sequential task-based code for shared-memory architectures

Author: Castes, Charly, primary, Agullo, Emmanuel, additional, Aumage, Olivier, additional, and Saillard, Emmanuelle, additional
Published: 2022
Full Text: View/download PDF

26. Direct solution of larger coupled sparse/dense linear systems using low-rank compression on single-node multi-core machines in an industrial context

Author: Agullo, Emmanuel, primary, Felsoci, Marek, additional, and Sylvand, Guillaume, additional
Published: 2022
Full Text: View/download PDF

27. Programmation parallèle à base de tâches pour algorithmes passant à l'échelle : application au produit de matrices

Author: Agullo, Emmanuel, Buttari, Alfredo, Guermouche, Abdou, Herrmann, Julien, Jego, Antoine, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Centre National de la Recherche Scientifique (CNRS), Algorithmes Parallèles et Optimisation (IRIT-APO), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Institut National Polytechnique (Toulouse) (Toulouse INP), Inria Bordeaux - Sud-Ouest, and ANR-19-CE46-0009,SOLHARIS,Solveurs pour architectures hétérogènes utilisant des supports d'exécution, objectif scalabilité(2019)
Subjects: parallelism, logiciels mathématiques, multiplication matricielle, matrix multiplication, communication patterns, mathematical software, schémas de communication, programmation à base de tâches, passage à l'échelle, task-based programming, [INFO]Computer Science [cs], [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], scalability, parallélisme
Abstract: Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community thanks to how they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way. In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write a compact yet efficient and scalable General Matrix Multiplication. This extension required few modifications to the StarPU runtime system. The final implementation is shown to be competitive up to 32,768 cores with state-of-the-art libraries and may outperform them on some specific problem configurations.; Les modèles de programmation à base de tâches ont réussi à susciter l'intérêt de la communauté des logiciels mathématiques de haute performance grâce à la manière dont ils soulagent une partie du fardeau que représentent le développement et la mise en œuvre efficace et portable d'algorithmes parallèles à mémoire distribuée. Dans des grappes d'ordinateurs de plus en plus grandes et hétérogènes, ces modèles apparaissent comme un moyen de développer et maintenir des algorithmes plus complexes. Cependant, les modèles de programmation basés sur les tâches manquent de flexibilité et les caractéristiques nécessaires pour exprimer de manière élégante et compacte des algorithmes passant à l'échelle se basant sur des schémas de communication avancés. Nous montrons que le paradigme de flot de tâches séquentiel (STF) peut être étendu pour écrire une multiplication matricielle passant à l'échelle. Cette extension a nécessité peu de modifications au système d'exécution StarPU. L'implantation finale est compétitive jusqu'à 32 768 cœurs avec les bibliothèques de pointe et peut même les surpasser dans certaines configurations spécifiques.
Published: 2022

28. Étude de la consommation de puissance du processeur et de la mémoire des solveurs couplés creux/denses

Author: Agullo, Emmanuel, Felšöci, Marek, Guermouche, Amina, Mathieu, Hervé, Sylvand, Guillaume, Tagliaro, Bastien, Felšöci, Marek, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), STatic Optimizations, Runtime Methods (STORM), Service Expérimentation et Développement [Bordeaux] (SED), Inria Bordeaux - Sud-Ouest, Airbus [France], Projet Région Nouvelle-Aquitaine 2018-1R50119 'HPC scalable ecosystem', and Inria Bordeaux Sud-Ouest
Subjects: matrices creuses et denses, méthode directe, solveurs parallèles, direct method, power consumption, consommation de puissance, sparse and dense matrices, energy_scope, parallel solvers, couplage FEM/BEM, [INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF], [INFO.INFO-PF] Computer Science [cs]/Performance [cs.PF], [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], FEM/BEM coupling
Abstract: In the aeronautical industry, aeroacoustics is used to model the propagation of acoustic waves in air flows enveloping an aircraft in flight. This for instance allows one to simulate the noise produced at ground level by an aircraft during the takeoff and landing phases, in order to validate that the regulatory environmental standards are met. Unlike most other complex physics simulations, the method resorts to solving coupled sparse/dense systems. In a previous study, we proposed two classes of algorithms for solving such large systems on a relatively small workstation (one or a few multicore nodes) based on compression techniques. The objective of this study is to assess whether the positive impact of the proposed algorithms on time to solution and memory usage translates to the energy consumption as well. Because of the nature of the problem, coupling dense and sparse matrices, and the underlying solutions methods, including dense, sparse direct and compression steps, this yields an interesting processor and memory power profile which we aim to analyze in details., Dans l'industrie aéronautique, l'aéroacoustique est utilisée pour modéliser la propagation d'ondes acoustiques à travers des courants d'air enveloppant un avion en vol. Par exemple, cela permet de simuler le bruit produit au niveau du sol par un avion pendant les phases de décollage et d'atterrissage afin de vérifier si les standards environnementaux réglementaires sont respectés. Contrairement à la plupart des simulations de problèmes physiques complexes, la méthode repose sur la solution de systèmes couplés creux/denses. Dans une précédente étude, nous avons proposé deux classes d'algorithmes pour résoudre ce type de grands systèmes linéaires sur une machine relativement petite (un ou peu de nœuds multi-cœurs) basés sur des techniques de compression. L'objectif de cette étude est de déterminer si l'impact positif de ces algorithmes sur l'utilisation de la mémoire se traduit également dans la consommation énergétique. Vu la nature du problème, le couplage de matrices creuses et denses ainsi que les méthodes de résolution sous-jacentes, y compris les étapes de compression creuse et dense, cela conduit à un profil de consommation de puissance du processeur et de la mémoire très intéressant que nous avons l'intention d'analyser en détails.
Published: 2022

29. Parallel hierarchical hybrid linear solvers for emerging computing platforms

Author: Agullo, Emmanuel, Giraud, Luc, Guermouche, Abdou, and Roman, Jean
Published: 2011
Full Text: View/download PDF

30. QCG-OMPI: MPI applications on grids

Author: Agullo, Emmanuel, Coti, Camille, Herault, Thomas, Langou, Julien, Peyronnet, Sylvain, Rezmerita, Ala, Cappello, Franck, and Dongarra, Jack
Published: 2011
Full Text: View/download PDF

31. Resiliency in numerical algorithm design for extreme scale simulations

Author: Agullo, Emmanuel, primary, Altenbernd, Mirco, additional, Anzt, Hartwig, additional, Bautista-Gomez, Leonardo, additional, Benacchio, Tommaso, additional, Bonaventura, Luca, additional, Bungartz, Hans-Joachim, additional, Chatterjee, Sanjay, additional, Ciorba, Florina M, additional, DeBardeleben, Nathan, additional, Drzisga, Daniel, additional, Eibl, Sebastian, additional, Engelmann, Christian, additional, Gansterer, Wilfried N, additional, Giraud, Luc, additional, Göddeke, Dominik, additional, Heisig, Marco, additional, Jézéquel, Fabienne, additional, Kohl, Nils, additional, Li, Xiaoye Sherry, additional, Lion, Romain, additional, Mehl, Miriam, additional, Mycek, Paul, additional, Obersteiner, Michael, additional, Quintana-Ortí, Enrique S, additional, Rizzi, Francesco, additional, Rüde, Ulrich, additional, Schulz, Martin, additional, Fung, Fred, additional, Speck, Robert, additional, Stals, Linda, additional, Teranishi, Keita, additional, Thibault, Samuel, additional, Thönnes, Dominik, additional, Wagner, Andreas, additional, and Wohlmuth, Barbara, additional
Published: 2021
Full Text: View/download PDF

32. Comparison of coupled solvers for FEM/BEM linear systems arising from discretization of aeroacoustic problems

Author: Agullo, Emmanuel, Felšöci, Marek, Sylvand, Guillaume, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Airbus [France], Projet Région Nouvelle-Aquitaine 2018-1R50119 'HPC scalable ecosystem', Inria Bordeaux Sud-Ouest, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, and Felšöci, Marek
Subjects: [INFO.INFO-NA] Computer Science [cs]/Numerical Analysis [cs.NA], Finite elements, Solver comparison, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Aeroacoustics, [INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS], [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA], Boundary elements, Coupled FEM/BEM linear systems, Computer Science::Numerical Analysis
Abstract: National audience; When discretization of an aeroacoustic physical model is based on the application of both the Finite Elements Method (FEM) and the Boundary Elements Method (BEM), this leads to coupled FEM/BEM linear systems combining sparse and dense parts. In this work, we propose and compare a set of implementation schemes relying on the coupling of the open-source sparse direct solver MUMPS with the proprietary direct solvers from Airbus Central R&T, i.e. the scalapack-like dense solver SPIDO and the hierarchical H-matrix compressed solver HMAT. For this preliminary study, we limit ourselves to a single 24-core computational node.
Published: 2021

33. A comparison of selected solvers for coupled FEM/BEM linear systems arising from discretization of aeroacoustic problems

Author: Agullo, Emmanuel, Felšöci, Marek, Sylvand, Guillaume, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Airbus [France], Inria Bordeaux Sud-Ouest, Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, and Projet Région Nouvelle-Aquitaine 2018-1R50119 'HPC scalable ecosystem'
Subjects: Modelization, Eléments finis de frontière, Finite elements, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], Comparaison de solveurs, Matrices creuses et denses, [INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA], Boundary elements, Aéroacoustique, Modélisation, Sparse and dense matrices, Solver comparison, Aeroacoustics, Systèmes linéaires couplés FEM/BEM, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Coupled FEM/BEM linear systems, Eléments finis
Abstract: When discretization of an aeroacoustic physical model is based on the application of both the Finite Elements Method (FEM) and the Boundary Elements Method (BEM), this leads to coupled FEM/BEM linear systems combining sparse and dense parts. In this preliminary study, we compare a set of sparse and dense solvers applied on the solution of such type of linear systems with the aim to identify the best performing configurations of existing solvers.; Lorsque la discrétisation d'un modèle aéroacoustique repose sur l'application d'à la fois la méthodes des éléments fini (FEM) et de la méthode des éléments finis de frontière (BEM), celle-ci conduit à des systèmes linéaires couplés FEM/BEM ayant des parties creuses ainsi que des parties denses. Dans cette étude préliminaire, nous faisons la comparaison d'un ensemble de solveurs creux et denses appliqués à la résolution de ce type de systèmes linéaires dans le but d'identifier les configurations les plus performantes des solveurs existants.
Published: 2021

34. A comparison of selected solvers for coupled FEM/BEM linear systems arising from discretization of aeroacoustic problems: literate and reproducible environment

Author: Agullo, Emmanuel, Felšöci, Marek, Sylvand, Guillaume, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Airbus [France], Projet Région Nouvelle-Aquitaine 2018-1R50119 'HPC scalable ecosystem', Inria Bordeaux Sud-Ouest, and Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest
Subjects: Guix, Org mode, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], Reproducible, Reproductible, [INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE], Literal programming, Programmation lettrée
Abstract: This is an accompanying technical report for the A comparison of selected solvers for coupled FEM/BEM linear systems arising from discretization of aeroacoustic problems Inria research report N° 9412. Based on the principles of literate programming, this technical report aims at providing detailed guidelines for reproducing the experiments of that research report. We use Org mode for literate programming and GNU Guix for software environment reproducibility. Note that part of the software involved is proprietary.; Ce document représente un rapport technique complémentaire au rapport de recherche Inria Une comparaison de solveurs choisis pour la résolution de systèmes linéaires couplés FEM/BEM résultant de la discrétisation de problèmes aéroacoustiques portant le numéro 9412. Basé sur les principes de la programmation lettrée, ce rapport technique vise à fournir des indications détaillées pour reproduire les expérimentations du rapport de recherche. Nous utilisons Org mode pour faire de la programmation lettrée et GNU Guix pour assurer la reproducibilité de l'environement logiciel expérimental. Notons que certains logiciels sont propriétaires.
Published: 2021

35. Resiliency in numerical algorithm design for extreme scale simulations

Author: Barcelona Supercomputing Center, Agullo, Emmanuel, Altenbernd, Mirco, Anzt, Hartwig, Bautista Gomez, Leonardo, Benacchio, Tommaso, Barcelona Supercomputing Center, Agullo, Emmanuel, Altenbernd, Mirco, Anzt, Hartwig, Bautista Gomez, Leonardo, and Benacchio, Tommaso
Abstract: This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is c, Peer Reviewed, "Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth", Postprint (author's final draft)
Published: 2021

36. A parallel out-of-core multifrontal method: Storage of factors on disk and analysis of models for an out-of-core active memory

Author: Agullo, Emmanuel, Guermouche, Abdou, and L’Excellent, Jean-Yves
Published: 2008
Full Text: View/download PDF

37. A complementary note on soft errors in the Conjugate Gradient method: the persistent error case

Author: Agullo, Emmanuel, Cools, Siegfried, Fatih-Yetkin, Emrullah, Giraud, Luc, Schenkels, Nick, Vanroose, Wim, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Universiteit Antwerpen [Antwerpen], Kadir Has University (KHAS), Inria Bordeaux Sud-Ouest, Plafrim - GENCI, Université Bordeaux Segalen - Bordeaux 2-Université Sciences et Technologies - Bordeaux 1-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université Bordeaux Segalen - Bordeaux 2-Université Sciences et Technologies - Bordeaux 1-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, and Universiteit Antwerpen = University of Antwerpen [Antwerpen]
Subjects: Sensibilité, Gradient Conjugué, Sensitivity, Exascale, Robustesse, Détection numérique, Soft-erreur, Numerical detection, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Robustness, Soft-error, Conjugate Gradient method, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA], Bit-flip
Abstract: This note is a follow up study to [1], where we studied the resilience of the preconditioned conjugate gradient method (PCG). We complement the original work by performinga similar series of numerical experiments, but using what we called persistent instead of transient bit-flips.; Cette note est une étude qui fait suite à [1], où nous avons étudié la résilience de la méthode du gradient conjugué préconditionné (PCG). Nous complétons le travail initial en effectuant une série similaire d’expériences numériques, mais en utilisant ce que nous avons appelé des bit-flips persistants au lieu de transitoires.
Published: 2020

38. Exploring variable accuracy storage through lossy compression techniques in numerical linear algebra: a first application to flexible GMRES

Author: Agullo, Emmanuel, Cappello, Franck, Di, Sheng, Giraud, Luc, Liang, Xin, Schenkels, Nick, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Argonne National Laboratory [Lemont] (ANL), Inria Bordeaux Sud-Ouest, Plafrim, and Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest
Subjects: Compression avec perte, Précision mixte, Mixed precision, Flexible GMRES, Lossy compression, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Inexact Krylov, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
Abstract: Large scale applications running on HPC systems often require a substantial amount of memory and can have a large computational overhead. Lossy data compression techniques can reduce the size of the data and associated communication cost, but the effect of the loss ofaccuracy on the numerical algorithm can be hard to predict. In this paper we examine the FGMRES algorithm, which requires the storage of a basis for the Krylov subspace and for the search space spanned by the solutions of the preconditioning systems. We show that the vectors spanning this search space can be compressed by looking at the combination of FGMRES and compression in the context of inexact Krylov subspace methods. This allows us to derive a bound on the normwise relative compression error in each iteration. We use this bound to formulate a number of different practical compression strategies, and validate and compare them through numerical experiments.; Les applications à grande échelle fonctionnant sur des systèmes HPC nécessitent souvent une quantité importante de mémoire et peuvent avoir une charge de calcul importante.Les techniques de compression de données avec perte peuvent réduire la taille des données et les coûts de communication associés, mais l’effet de la perte de précision sur l’algorithme numérique peut être difficile à prévoir. Dans cet article, nous examinons l’algorithme FGMRES, qui nécessite le stockage d’une base pour le sous-espace de Krylov et pour l’espace de recherche couvert parles solutions des systèmes de préconditionnement. Nous montrons que les vecteurs couvrant cet espace de recherche peuvent être comprimés en examinant la combinaison de FGMRES et de la compression dans le contexte des méthodes inexactes du sous-espace de Krylov. Cela nous permet de dériver une borne sur l’erreur de compression relative normale dans chaque itération. Nous utilisons cette limite pour formuler un certain nombre de stratégies de compression pratiques différentes, et les valider et les comparer par des expériences numériques.
Published: 2020

39. On soft errors in the Conjugate Gradient method: sensitivity and robust numerical detection -revised

Author: Agullo, Emmanuel, Cools, Siegfried, Fatih-Yetkin, Emrullah, Giraud, Luc, Schenkels, Nick, Vanroose, Wim, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Universiteit Antwerpen [Antwerpen], Kadir Has University (KHAS), This work has been funded by the EXA2CT European Project on Exascale Algorithms and Advanced Computational Techniques, which receives funding from the EU’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 610741. Experiments presented in this paper were carried out using the PlaFRIM experimental testbed, supported by Inria, CNRS(LABRI and IMB), Université de Bordeaux, Bordeaux INP and Conseil Régional d’Aquitaine(see https://www.plafrim.fr/). Siegfried Cools acknowledges funding by the Research FoundationFlanders (FWO) under grand number 12H4617N., Inria Bordeaux Sud-Ouest, Plafrim - GENCI, European Project: 610741,EC:FP7:ICT,FP7-ICT-2013-10,EXA2CT(2013), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Universiteit Antwerpen = University of Antwerpen [Antwerpen], Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université Sciences et Technologies - Bordeaux 1-Université Bordeaux Segalen - Bordeaux 2-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université Sciences et Technologies - Bordeaux 1-Université Bordeaux Segalen - Bordeaux 2-Inria Bordeaux - Sud-Ouest, and Université Bordeaux Segalen - Bordeaux 2-Université Sciences et Technologies - Bordeaux 1-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université Bordeaux Segalen - Bordeaux 2-Université Sciences et Technologies - Bordeaux 1-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest
Subjects: Sensibilité, Gradient Conjugué, Exascale, Sensitivity, Robustesse, Détection numérique, Soft-erreur, Numerical detection, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Robustness, Soft-error, Conjugate Gradient method, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA], Bit-flip
Abstract: The conjugate gradient (CG) method is the most widely used iterative scheme forthe solution of large sparse systems of linear equations when the matrix is symmetric positivedefinite. Although more than sixty year old, it is still a serious candidate for extreme-scalecomputation on large computing platforms. On the technological side, the continuous shrinkingof transistor geometry and the increasing complexity of these devices affect dramatically theirsensitivity to natural radiation, and thus diminish their reliability. One of the most common effectsproduced by natural radiation is the single event upset which consists in a bit-flip in a memory cellproducing unexpected results at application level. Consequently, the future computing facilitiesat extreme scale might be more prone to errors of any kind including bit-flip during calculation.These numerical and technological observations are the main motivations for this work, where wefirst investigate through extensive numerical experiments the sensitivity of CG to bit-flips in itsmain computationally intensive kernels, namely the matrix-vector product and the preconditionerapplication. We further propose numerical criteria to detect the occurrence of such soft errors; weassess their robustness through extensive numerical experiments.; La méthode du gradient conjugue (CG) est la méthode itérative la plus utilisée pour résoudre des systèmes linéaires creux de grande taille lorsque la matrice est symétrique définie positive. Bien que vieille de de soixante ans, cette méthode reste une candidate sérieuse pour être mise en œuvre pour la résolution de très grands systèmes linéaires sur des plateformes de calcul de très grande taille. Sur le plan technologique, la réduction permanente de la taille et la complexité croissante des composantes électroniques de ces calculateurs affecte dramatiquement leur sensibilité aux radiations cosmiques ce qui réduit leur fiabilité. L’un des effets les plus courants des rayonnements naturels est la perturbation due à un événement unique qui consiste en un retournement de bit dans une cellule mémoire produisant des résultats inattendus au niveau de l’application. Par conséquent, les futures installations informatiques à très grande échelle pourraient être plus sujettes à des erreurs de toute sorte. y compris le basculement de bit pendant le calcul. Ces observations numériques et technologiques sont les suivantes les principales motivations de ce travail, pour lequel nous étudions d’abord par le biais d’études approfondies et approfondies la sensibilité de la CG aux sauts de bits dans ses principaux domaines d’application.à forte intensité de calcul, à savoir le produit matrice-vecteur et le produit application du préconditionneur. Nous proposons en outre des critères numériques pour détecter l’apparition de tels défauts ; nous évaluons leur robustesse à travers des expériences numériques approfondies.
Published: 2020

40. Multifrontal QR Factorization for Multicore Architectures over Runtime Systems

Author: Agullo, Emmanuel, primary, Buttari, Alfredo, additional, Guermouche, Abdou, additional, and Lopez, Florent, additional
Published: 2013
Full Text: View/download PDF

41. A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs

Author: Agullo, Emmanuel, primary, Augonnet, Cédric, additional, Dongarra, Jack, additional, Ltaief, Hatem, additional, Namyst, Raymond, additional, Thibault, Samuel, additional, and Tomov, Stanimire, additional
Published: 2012
Full Text: View/download PDF

42. A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures

Author: Agullo, Emmanuel, primary, Dongarra, Jack, additional, Nath, Rajib, additional, and Tomov, Stanimire, additional
Published: 2011
Full Text: View/download PDF

43. Resiliency in numerical algorithm design for extreme scale simulations.

Author: Agullo, Emmanuel, Altenbernd, Mirco, Anzt, Hartwig, Bautista-Gomez, Leonardo, Benacchio, Tommaso, Bonaventura, Luca, Bungartz, Hans-Joachim, Chatterjee, Sanjay, Ciorba, Florina M, DeBardeleben, Nathan, Drzisga, Daniel, Eibl, Sebastian, Engelmann, Christian, Gansterer, Wilfried N, Giraud, Luc, Göddeke, Dominik, Heisig, Marco, Jézéquel, Fabienne, Kohl, Nils, and Li, Xiaoye Sherry
Subjects: *MEAN time between failure, *SOFT errors, *PARALLEL computers
Abstract: This work is based on the seminar titled 'Resiliency in Numerical Algorithm Design for Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

44. On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme

Author: Agullo, Emmanuel, primary, Guermouche, Abdou, additional, and L’Excellent, Jean-Yves, additional
Published: 2008
Full Text: View/download PDF

45. On Soft Errors in the Conjugate Gradient Method: Sensitivity and Robust Numerical Detection

Author: Agullo, Emmanuel, primary, Cools, Siegfried, additional, Yetkin, Emrullah Fatih, additional, Giraud, Luc, additional, Schenkels, Nick, additional, and Vanroose, Wim, additional
Published: 2020
Full Text: View/download PDF

46. A Preliminary Out-of-Core Extension of a Parallel Multifrontal Solver

Author: Agullo, Emmanuel, primary, Guermouche, Abdou, additional, and L’Excellent, Jean-Yves, additional
Published: 2006
Full Text: View/download PDF

47. Simulation of a Sparse Direct Solver on Heterogeneous Systems using Starpu and Simgrid

Author: Agullo, Emmanuel, Buttari, Alfredo, Guermouche, Abdou, Legrand, Arnaud, Masliah, Ian, Stanisic, Luka, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Algorithmes Parallèles et Optimisation (IRIT-APO), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Centre National de la Recherche Scientifique (CNRS), Performance analysis and optimization of LARge Infrastructures and Systems (POLARIS ), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), and Université de Toulouse (UT)
Subjects: [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], ComputingMilieux_MISCELLANEOUS, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
Abstract: International audience
Published: 2019

48. Numerical Analysis of the Maximal Attainable Accuracy in Communication-hiding Pipelined Conjugate Gradients

Author: Cools, Siegfried, Cornelis, Jeffrey, Agullo, Emmanuel, Fatih-Yetkin, Emrullah, Giraud, Luc, Vanroose, Wim, Universiteit Antwerpen [Antwerpen], High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Kadir Has University (KHAS), Universiteit Antwerpen = University of Antwerpen [Antwerpen], Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, and Giraud, Luc
Subjects: [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [MATH.MATH-NA] Mathematics [math]/Numerical Analysis [math.NA], [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], ComputingMilieux_MISCELLANEOUS, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
Abstract: International audience
Published: 2019

49. On Soft Errors in the Conjugate Gradient: Sensitivity and Robust Numerical Detection

Author: Agullo, Emmanuel, Cools, Siegfried, Giraud, Luc, Fatih-Yetkin, Emrullah, Vanroose, Wim, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Universiteit Antwerpen [Antwerpen], Kadir Has University (KHAS), Giraud, Luc, Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, and Universiteit Antwerpen = University of Antwerpen [Antwerpen]
Subjects: [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [MATH.MATH-NA] Mathematics [math]/Numerical Analysis [math.NA], [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], ComputingMilieux_MISCELLANEOUS, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
Abstract: International audience
Published: 2019

50. Energy analysis of a solver stack for frequency-domain electromagnetics

Author: Agullo, Emmanuel, Giraud, Luc, Lanteri, Stéphane, Marait, Gilles, Orgerie, Anne-Cécile, Poirel, Louis, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Numerical modeling and high performance computing for evolution problems in complex domains and heterogeneous media (NACHOS), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Alexandre Dieudonné (JAD), Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Bretagne Sud (UBS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-CentraleSupélec-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Inria Bordeaux Sud-Ouest, European Project: 730913,H2020,H2020-EINFRA-2016-1,PRACE-5IP(2017), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Alexandre Dieudonné (LJAD), Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique)
Subjects: Energy consumption, Consommation énergétique, HPC, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
Abstract: High-performance computing (HPC) aims at developing models and simulationsfor applications in numerous scientific fields. Yet, the energy consumption of these HPC facilitiescurrently limits their size and performance, and consequently the size of the tackled problems.The complexity of the HPC software stacks and their various optimizations makes it difficult tofinely understand the energy consumption of scientific applications. To highlight this difficulty on aconcrete use-case, we perform an energy and power analysis of a software stack for the simulation offrequency-domain electromagnetic wave propagation. This solver stack combines a high order finiteelement discretization framework of the system of three-dimensional frequency-domain Maxwellequations with an algebraic hybrid iterative-direct sparse linear solver. This analysis is conductedon the KNL-based PRACE-PCP system. Our results illustrate the difficulty in predicting how totrade energy and runtime.; Le calcul haute performance (HPC) vise à développer des modèles et des simulationsdans de nombreux domaines scientifiques. Pourtant, la consommation d’énergie des installationsde calcul haute performance limite actuellement leur taille et leur performance et, par conséquent,la taille des problèmes abordés. La complexité des piles logicielles de calcul haute performanceet de leurs différentes optimisations rendent difficile la compréhension fine de la consommationénergétique des applications scientifiques. Pour mettre en évidence cette difficulté sur un casd’utilisation concret, nous effectuons une analyse énergétique d’une pile logicielle pour dessimulations de propagation d’ondes électromagnétiques dans le domaine fréquentiel. Ce solveurcombine une approche de discrétisation par éléments finis d’ordre élevé pour le système d’équationsde Maxwell tridimensionnelles dans le domaine fréquenciels et un solveur parallèle itératif hybrideitératif-direct pour la résolution du système creux associé. Cette analyse est effectuée sur leSystème PRACE-PCP à base de KNL. Nos résultats illustrent la difficulté à prédire un compromisen énergie et temps de calcul.
Published: 2018

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

229 results on '"Agullo, Emmanuel"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources