171 results on '"Agullo, Emmanuel"'
Search Results
152. Task-Based FMM for Multicore Architectures
- Author
-
Agullo, Emmanuel, primary, Bramas, Bérenger, additional, Coulaud, Olivier, additional, Darve, Eric, additional, Messner, Matthias, additional, and Takahashi, Toru, additional
- Published
- 2014
- Full Text
- View/download PDF
153. Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms.
- Author
-
Agullo, Emmanuel, Beaumont, Olivier, Eyraud-Dubois, Lionel, Herrmann, Julien, Kumar, Suraj, Marchal, Loris, and Thibault, Samuel
- Published
- 2015
- Full Text
- View/download PDF
154. Task-based FMM for heterogeneous architectures.
- Author
-
Agullo, Emmanuel, Bramas, Berenger, Coulaud, Olivier, Darve, Eric, Messner, Matthias, and Takahashi, Toru
- Subjects
FAST multipole method ,TASK analysis ,HETEROGENEOUS computing ,COMPUTER architecture ,GRAPHICS processing units ,COMPUTER simulation - Abstract
High performance fast multipole method is crucial for the numerical simulation of many physical problems. In a previous study, we have shown that task-based fast multipole method provides the flexibility required to process a wide spectrum of particle distributions efficiently on multicore architectures. In this paper, we now show how such an approach can be extended to fully exploit heterogeneous platforms. For that, we design highly tuned graphics processing unit (GPU) versions of the two dominant operators P2P and M2L) as well as a scheduling strategy that dynamically decides which proportion of subsequent tasks is processed on regular CPU cores and on GPU accelerators. We assess our method with the StarPU runtime system for executing the resulting task flow on an Intel X5650 Nehalem multicore processor possibly enhanced with one, two, or three Nvidia Fermi M2070 or M2090 GPUs (Santa Clara, CA, USA). A detailed experimental study on two 30 million particle distributions (a cube and an ellipsoid) shows that the resulting software consistently achieves high performance across architectures. Copyright © 2015 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
155. Abstract: Matrices Over Runtime Systems at Exascale
- Author
-
Agullo, Emmanuel, primary, Bosilca, George, additional, Bramas, Berenger, additional, Castagnede, Cedric, additional, Coulaud, Olivier, additional, Darve, Eric, additional, Dongarra, Jack, additional, Faverge, Mathieu, additional, Furmento, Nathalie, additional, Giraud, Luc, additional, Lacoste, Xavier, additional, Langou, Julien, additional, Ltaief, Hatem, additional, Messner, Matthias, additional, Namyst, Raymond, additional, Ramet, Pierre, additional, Takahashi, Toru, additional, Thibault, Samuel, additional, Tomov, Stanimire, additional, and Yamazaki, Ichitaro, additional
- Published
- 2012
- Full Text
- View/download PDF
156. Poster: Matrices over Runtime Systems at Exascale
- Author
-
Agullo, Emmanuel, primary, Bosilca, George, additional, Bramas, Berenger, additional, Castagnede, Cedric, additional, Coulaud, Olivier, additional, Darve, Eric, additional, Dongarra, Jack, additional, Faverge, Mathieu, additional, Furmento, Nathalie, additional, Giraud, Luc, additional, Lacoste, Xavier, additional, Langou, Julien, additional, Ltaief, Hatem, additional, Messner, Matthias, additional, Namyst, Raymond, additional, Ramet, Pierre, additional, Takahashi, Toru, additional, Thibault, Samuel, additional, Tomov, Stanimire, additional, and Yamazaki, Ichitaro, additional
- Published
- 2012
- Full Text
- View/download PDF
157. LU factorization for accelerator-based systems
- Author
-
Agullo, Emmanuel, primary, Augonnet, Cedric, additional, Dongarra, Jack, additional, Faverge, Mathieu, additional, Langou, Julien, additional, Ltaief, Hatem, additional, and Tomov, Stanimire, additional
- Published
- 2011
- Full Text
- View/download PDF
158. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators
- Author
-
Agullo, Emmanuel, primary, Augonnet, Cedric, additional, Dongarra, Jack, additional, Faverge, Mathieu, additional, Ltaief, Hatem, additional, Thibault, Samuel, additional, and Tomov, Stanimire, additional
- Published
- 2011
- Full Text
- View/download PDF
159. QR factorization of tall and skinny matrices in a grid computing environment
- Author
-
Agullo, Emmanuel, primary, Coti, Camille, additional, Dongarra, Jack, additional, Herault, Thomas, additional, and Langem, Julien, additional
- Published
- 2010
- Full Text
- View/download PDF
160. Reducing the I/O Volume in Sparse Out-of-core Multifrontal Methods
- Author
-
Agullo, Emmanuel, primary, Guermouche, Abdou, additional, and L'Excellent, Jean-Yves, additional
- Published
- 2010
- Full Text
- View/download PDF
161. Tile QR factorization with parallel panel processing for multicore architectures
- Author
-
Hadri, Bilel, primary, Ltaief, Hatem, additional, Agullo, Emmanuel, additional, and Dongarra, Jack, additional
- Published
- 2010
- Full Text
- View/download PDF
162. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware
- Author
-
Agullo, Emmanuel, primary, Hadri, Bilel, additional, Ltaief, Hatem, additional, and Dongarrra, Jack, additional
- Published
- 2009
- Full Text
- View/download PDF
163. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects
- Author
-
Agullo, Emmanuel, primary, Demmel, Jim, additional, Dongarra, Jack, additional, Hadri, Bilel, additional, Kurzak, Jakub, additional, Langou, Julien, additional, Ltaief, Hatem, additional, Luszczek, Piotr, additional, and Tomov, Stanimire, additional
- Published
- 2009
- Full Text
- View/download PDF
164. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware.
- Author
-
Agullo, Emmanuel, Hadri, Bilel, Ltaief, Hatem, and Dongarrra, Jack
- Published
- 2009
- Full Text
- View/download PDF
165. Reducing the I/O Volume in an Out-of-Core Sparse Multifrontal Solver.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Aluru, Srinivas, Parashar, Manish, Badrinath, Ramamurthy, Prasanna, Viktor K., and Agullo, Emmanuel
- Abstract
High performance sparse direct solvers are often a method of choice in various simulation problems. However, they require a large amount of memory compared to iterative methods. In this context, out-of-core solvers must be employed, where disks are used when the storage requirements are too large with respect to the physical memory available. In this paper, we study how to minimize the I/O requirements in the multifrontal method, a particular direct method to solve large-scale problems efficiently. Experiments on large real-life problems also show that the volume of I/O obtained when minimizing the storage requirement can be significantly reduced by applying algorithms designed to reduce the I/O volume. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
166. A Preliminary Out-of-Core Extension of a Parallel Multifrontal Solver.
- Author
-
Nagel, Wolfgang E., Walter, Wolfgang V., Lehner, Wolfgang, Agullo, Emmanuel, Guermouche, Abdou, and L'Excellent, Jean-Yves
- Abstract
The memory usage of sparse direct solvers can be the bottleneck to solve large-scale problems. This paper describes a first implementation of an out-of-core extension to a parallel multifrontal solver (MUMPS). We show that larger problems can be solved on limited-memory machines with reasonable performance, and we illustrate the behaviour of our parallel out-of-core factorization. Then we use simulations to discuss how our algorithms can be modified to solve much larger problems. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
167. REDUCING THE I/O VOLUME IN SPARSE OUT-OF-CORE MULTIFRONTAL METHODS.
- Author
-
Agullo, Emmanuel, Guermouche, Abdou, and L'Excellent, Jean-Yves
- Subjects
- *
NUMERICAL analysis , *SPARSE matrices , *LINEAR systems , *NUMERICAL solutions to equations , *ALGORITHMS - Abstract
Sparse direct solvers, and in particular multifrontal methods, are methods of choice to solve the large sparse systems of linear equations arising in certain simulation problems. However, they require a large amount of memory (e.g., in comparison to iterative methods). In this context, out-of-core solvers may be employed: disks are used when the required storage exceeds the available physical memory. In this paper, we show how to process the task dependency graph of multifrontal methods in a way that minimizes the input/output (I/O) requirements. From a theoretical point of view, we show that minimizing the storage requirement can lead to a huge volume of I/O compared to directly minimizing the I/O volume. Then experiments on large real-world problems also show that applying standard algorithms to minimize the storage is not always efficient at reducing the volume of I/O and that significant gains can be obtained with the use of our algorithms to minimize I/O. We finally show that efficient memory management algorithms can be applied to all the variants proposed. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
168. High order HDG method and domain decomposition solvers for frequency‐domain electromagnetics.
- Author
-
Agullo, Emmanuel, Giraud, Luc, Gobé, Alexis, Kuhn, Matthieu, Lanteri, Stéphane, and Moya, Ludovic
- Subjects
- *
DECOMPOSITION method , *DOMAIN decomposition methods , *MAXWELL equations , *COMPUTATIONAL electromagnetics , *HIGH performance computing - Abstract
This work is concerned with the numerical treatment of the system of three‐dimensional frequency‐domain (or time‐harmonic) Maxwell equations using a high order hybridizable discontinuous Galerkin (HDG) approximation method combined with domain decomposition (DD) on the basis of hybrid iterative‐direct parallel solution strategies. The proposed HDG method preserves the advantages of classical DG methods previously introduced for the time‐domain Maxwell equations, in particular, in terms of accuracy and flexibility with regards to the discretization of complex geometrical features, while keeping the computational efficiency at the level of the reference edge element‐based finite element formulation widely adopted for the considered PDE system. We study in details the computational performances of the resulting DD solvers in particular in terms of scalability metrics by considering both a model test problem and more realistic large‐scale simulations performed on high performance computing systems consisting of networked multicore nodes. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
169. Combler l’écart entre H-Matrices et méthodes directes creuses pour la résolution de systèmes linéaires de grandes tailles
- Author
-
Falco, Aurélien, STAR, ABES, Giraud, Luc, Sylvand, Guillaume, Levadoux, David, Pont, Grégoire, Agullo, Emmanuel, Alouges, François, Ng, Esmond G., Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux, Luc Giraud, and Guillaume Sylvand
- Subjects
[INFO.INFO-OH] Computer Science [cs]/Other [cs.OH] ,Matrices creuses ,H-Matrices ,[INFO.INFO-OH]Computer Science [cs]/Other [cs.OH] ,Sparse matrices ,Finite elements ,Compression de rang faible ,Linear algebra ,Low-Rank compression ,Couplage FEM/BEM ,Algèbre linéaire ,Eléments finis ,FEM/BEM coupling - Abstract
Many physical phenomena may be studied through modeling and numerical simulations, commonplace in scientific applications. To be tractable on a computer, appropriated discretization techniques must be considered, which often lead to a set of linear equations whose features depend on the discretization techniques. Among them, the Finite Element Method usually leads to sparse linear systems whereas the Boundary Element Method leads to dense linear systems. The size of the resulting linear systems depends on the domain where the studied physical phenomenon develops and tends to become larger and larger as the performance of the computer facilities increases. For the sake of numerical robustness, the solution techniques based on the factorization of the matrix associated with the linear system are the methods of choice when affordable. In that respect, hierarchical methods based on low-rank compression have allowed a drastic reduction of the computational requirements for the solution of dense linear systems over the last two decades. For sparse linear systems, their application remains a challenge which has been studied by both the community of hierarchical matrices and the community of sparse matrices. On the one hand, the first step taken by the community of hierarchical matrices most often takes advantage of the sparsity of the problem through the use of nested dissection. While this approach benefits from the hierarchical structure, it is not, however, as efficient as sparse solvers regarding the exploitation of zeros and the structural separation of zeros from non-zeros. On the other hand, sparse factorization is organized so as to lead to a sequence of smaller dense operations, enticing sparse solvers to use this property and exploit compression techniques from hierarchical methods in order to reduce the computational cost of these elementary operations. Nonetheless, the globally hierarchical structure may be lost if the compression of hierarchical methods is used only locally on dense submatrices. We here review the main techniques that have been employed by both those communities, trying to highlight their common properties and their respective limits with a special emphasis on studies that have aimed to bridge the gap between them. With these observations in mind, we propose a class of hierarchical algorithms based on the symbolic analysis of the structure of the factors of a sparse matrix. These algorithms rely on a symbolic information to cluster and construct a hierarchical structure coherent with the non-zero pattern of the matrix. Moreover, the resulting hierarchical matrix relies on low-rank compression for the reduction of the memory consumption of large submatrices as well as the time to solution of the solver. We also compare multiple ordering techniques based on geometrical or topological properties. Finally, we open the discussion to a coupling between the Finite Element Method and the Boundary Element Method in a unified computational framework., De nombreux phénomènes physiques peuvent être étudiés au moyen de modélisations et de simulations numériques, courantes dans les applications scientifiques. Pour être calculable sur un ordinateur, des techniques de discrétisation appropriées doivent être considérées, conduisant souvent à un ensemble d’équations linéaires dont les caractéristiques dépendent des techniques de discrétisation. D’un côté, la méthode des éléments finis conduit généralement à des systèmes linéaires creux, tandis que les méthodes des éléments finis de frontière conduisent à des systèmes linéaires denses. La taille des systèmes linéaires en découlant dépend du domaine où le phénomène physique étudié se produit et tend à devenir de plus en plus grand à mesure que les performances des infrastructures informatiques augmentent. Pour des raisons de robustesse numérique, les techniques de solution basées sur la factorisation de la matrice associée au système linéaire sont la méthode de choix utilisée lorsqu’elle est abordable. A cet égard, les méthodes hiérarchiques basées sur de la compression de rang faible ont permis une importante réduction des ressources de calcul nécessaires pour la résolution de systèmes linéaires denses au cours des deux dernières décennies. Pour les systèmes linéaires creux, leur utilisation reste un défi qui a été étudié à la fois par la communauté des matrices hiérarchiques et la communauté des matrices creuses. D’une part, la communauté des matrices hiérarchiques a d’abord exploité la structure creuse du problème via l’utilisation de la dissection emboitée. Bien que cette approche bénéficie de la structure hiérarchique qui en résulte, elle n’est pas aussi efficace que les solveurs creux en ce qui concerne l’exploitation des zéros et la séparation structurelle des zéros et des non-zéros. D’autre part, la factorisation creuse est accomplie de telle sorte qu’elle aboutit à une séquence d’opérations plus petites et denses, ce qui incite les solveurs à utiliser cette propriété et à exploiter les techniques de compression des méthodes hiérarchiques afin de réduire le coût de calcul de ces opérations élémentaires. Néanmoins, la structure hiérarchique globale peut être perdue si la compression des méthodes hiérarchiques n’est utilisée que localement sur des sous-matrices denses. Nous passons en revue ici les principales techniques employées par ces deux communautés, en essayant de mettre en évidence leurs propriétés communes et leurs limites respectives, en mettant l’accent sur les études qui visent à combler l’écart qui les séparent. Partant de ces observations, nous proposons une classe d’algorithmes hiérarchiques basés sur l’analyse symbolique de la structure des facteurs d’une matrice creuse. Ces algorithmes s’appuient sur une information symbolique pour grouper les inconnues entre elles et construire une structure hiérarchique cohérente avec la disposition des non-zéros de la matrice. Nos méthodes s’appuient également sur la compression de rang faible pour réduire la consommation mémoire des sous-matrices les plus grandes ainsi que le temps que met le solveur à trouver une solution. Nous comparons également des techniques de renumérotation se fondant sur des propriétés géométriques ou topologiques. Enfin, nous ouvrons la discussion à un couplage entre la méthode des éléments finis et la méthode des éléments finis de frontière dans un cadre logiciel unique.
- Published
- 2019
170. On the design of sparse hybrid linear solvers for modern parallel architectures
- Author
-
NAKOV, Stojce, Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux, Jean Roman, Emmanuel Agullo, STAR, ABES, Roman, Jean, Agullo, Emmanuel, Calandra, Henri, Diaz, Julien, Casas, Marc, Li, Xiaoye Sherry, and Bosilca, George
- Subjects
Solveur linéaires creux ,Heterogeneous architectures ,[INFO.INFO-OH]Computer Science [cs]/Other [cs.OH] ,Calcul haute performance ,Moteur d’exécution ,Multi-coeur ,[INFO.INFO-OH] Computer Science [cs]/Other [cs.OH] ,Méthode hybride ,Sparse linear solver ,Multicore ,Task ,Hybrid method ,Architecture hétérogène ,High Performance Computing (HPC) ,Programmation en tâche - Abstract
In the context of this thesis, our focus is on numerical linear algebra, more precisely on solution of large sparse systems of linear equations. We focus on designing efficient parallel implementations of MaPHyS, an hybrid linear solver based on domain decomposition techniques. First we investigate the MPI+threads approach. In MaPHyS, the first level of parallelism arises from the independent treatment of the various subdomains. The second level is exploited thanks to the use of multi-threaded dense and sparse linear algebra kernels involved at the subdomain level. Such an hybrid implementation of an hybrid linear solver suitably matches the hierarchical structure of modern supercomputers and enables a trade-off between the numerical and parallel performances of the solver. We demonstrate the flexibility of our parallel implementation on a set of test examples. Secondly, we follow a more disruptive approach where the algorithms are described as sets of tasks with data inter-dependencies that leads to a directed acyclic graph (DAG) representation. The tasks are handled by a runtime system. We illustrate how a first task-based parallel implementation can be obtained by composing task-based parallel libraries within MPI processes throught a preliminary prototype implementation of our hybrid solver. We then show how a task-based approach fully abstracting the hardware architecture can successfully exploit a wide range of modern hardware architectures. We implemented a full task-based Conjugate Gradient algorithm and showed that the proposed approach leads to very high performance on multi-GPU, multicore and heterogeneous architectures., Dans le contexte de cette thèse, nous nous focalisons sur des algorithmes pour l’algèbre linéaire numérique, plus précisément sur la résolution de grands systèmes linéaires creux. Nous mettons au point des méthodes de parallélisation pour le solveur linéaire hybride MaPHyS. Premièrement nous considerons l'aproche MPI+threads. Dans MaPHyS, le premier niveau de parallélisme consiste au traitement indépendant des sous-domaines. Le second niveau est exploité grâce à l’utilisation de noyaux multithreadés denses et creux au sein des sous-domaines. Une telle implémentation correspond bien à la structure hiérarchique des supercalculateurs modernes et permet un compromis entre les performances numériques et parallèles du solveur. Nous démontrons la flexibilité de notre implémentation parallèle sur un ensemble de cas tests. Deuxièmement nous considérons un approche plus innovante, où les algorithmes sont décrits comme des ensembles de tâches avec des inter-dépendances, i.e., un graphe de tâches orienté sans cycle (DAG). Nous illustrons d’abord comment une première parallélisation à base de tâches peut être obtenue en composant des librairies à base de tâches au sein des processus MPI illustrer par un prototype d’implémentation préliminaire de notre solveur hybride. Nous montrons ensuite comment une approche à base de tâches abstrayant entièrement le matériel peut exploiter avec succès une large gamme d’architectures matérielles. À cet effet, nous avons implanté une version à base de tâches de l’algorithme du Gradient Conjugué et nous montrons que l’approche proposée permet d’atteindre une très haute performance sur des architectures multi-GPU, multicoeur ainsi qu’hétérogène.
- Published
- 2015
171. Méthodes de décomposition de domaine algébriques pour solveurs hybrides (direct/itératif)
- Author
-
POIREL, Louis, Giraud, Luc, Agullo, Emmanuel, Legrand, Arnaud, Gander, Martin, Heroux, Michael, Le Tallec, Patrick, Cuenot, Bénédicte, and Roux, François-Xavier
- Subjects
Méthodes de décomposition de domaine ,Solveur parallèle hybride (direct/iteratif) ,Espace grossier ,Science reproductible ,Programmation lettrée
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.