43 results on '"Christophe Calvin"'
Search Results
2. Efficient Cross Section Reconstruction on Modern Multi and Many Core Architectures.
- Author
-
Yunsong Wang, François-Xavier Hugot, Emeric Brun, Fausto Malvagi, and Christophe Calvin
- Published
- 2017
- Full Text
- View/download PDF
3. Competing Energy Lookup Algorithms in Monte Carlo Neutron Transport Calculations and their Optimization on CPU and Intel MIC Architectures.
- Author
-
Yunsong Wang, Emeric Brun, Fausto Malvagi, and Christophe Calvin
- Published
- 2016
- Full Text
- View/download PDF
4. Competing energy lookup algorithms in Monte Carlo neutron transport calculations and their optimization on CPU and Intel MIC architectures.
- Author
-
Yunsong Wang, Emeric Brun, Fausto Malvagi, and Christophe Calvin
- Published
- 2017
- Full Text
- View/download PDF
5. An Efficient Task-Based Execution Model for Stochastic Linear Solver on Multi-core and Many-Core Systems.
- Author
-
Fan Ye, Christophe Calvin, and Serge G. Petiton
- Published
- 2015
- Full Text
- View/download PDF
6. Toward Restarting Strategies Tuning for a Krylov Eigenvalue Solver.
- Author
-
France Boillod-Cerneux, Serge G. Petiton, Christophe Calvin, and Leroy Anthony Drummond
- Published
- 2014
- Full Text
- View/download PDF
7. A Study of SpMV Implementation Using MPI and OpenMP on Intel Many-Core Architecture.
- Author
-
Fan Ye, Christophe Calvin, and Serge G. Petiton
- Published
- 2014
- Full Text
- View/download PDF
8. The Exploration of Pervasive and Fine-Grained Parallel Model Applied on Intel Xeon Phi Coprocessor.
- Author
-
Christophe Calvin, Fan Ye, and Serge G. Petiton
- Published
- 2013
- Full Text
- View/download PDF
9. Performance and Numerical Accuracy Evaluation of Heterogeneous Multicore Systems for Krylov Orthogonal Basis Computation.
- Author
-
Jérôme Dubois, Christophe Calvin, and Serge G. Petiton
- Published
- 2010
- Full Text
- View/download PDF
10. Improving Scalability Using Hybrid Asynchronous Methods For Non-Hermitian Eigenproblems.
- Author
-
Jérôme Dubois, Christophe Calvin, and Serge G. Petiton
- Published
- 2011
- Full Text
- View/download PDF
11. Accelerating the Explicitly Restarted Arnoldi Method with GPUs Using an Autotuned Matrix Vector Product.
- Author
-
Jérôme Dubois, Christophe Calvin, and Serge G. Petiton
- Published
- 2011
- Full Text
- View/download PDF
12. The TRIO-Unitaire Project: A Parallel CFD 3-Dimensional Code.
- Author
-
Christophe Calvin and Ph. Emonot
- Published
- 1997
- Full Text
- View/download PDF
13. Overlapping techniques of communications.
- Author
-
Christophe Calvin, Laurent Colombet, and Philippe Michallon
- Published
- 1995
- Full Text
- View/download PDF
14. All-to-all broadcast in torus with wormhole-like routing.
- Author
-
Christophe Calvin, Stéphane Pérennes, and Denis Trystram
- Published
- 1995
- Full Text
- View/download PDF
15. Towards Mixed Computation/Communication in Parallel Scientific Libraries.
- Author
-
Christophe Calvin, Laurent Colombet, Frederic Desprez, B. Jargot, Philippe Michallon, Bernard Tourancheau, and Denis Trystram
- Published
- 1994
- Full Text
- View/download PDF
16. Minimizing Communication Overhead Using Pipelining for Multi-Dimensional FFT on Distributed Memory Machines.
- Author
-
Christophe Calvin and Frederic Desprez
- Published
- 1993
17. Methods to Overlap Communications in Parallel Numerical Algorithms.
- Author
-
Christophe Calvin, Laurent Colombet, and Philippe Michallon
- Published
- 1997
- Full Text
- View/download PDF
18. Matrix Transpose for Block Allocations on Torus and de Bruijn Networks.
- Author
-
Christophe Calvin and Denis Trystram
- Published
- 1996
- Full Text
- View/download PDF
19. Implementation of Parallel FFT Algorithms on Distributed Memory Machines with a Minimum Overhed of Communication.
- Author
-
Christophe Calvin
- Published
- 1996
- Full Text
- View/download PDF
20. Performance Evaluation and Modeling of Collective Communications on Cray T3D.
- Author
-
Christophe Calvin and Laurent Colombet
- Published
- 1996
- Full Text
- View/download PDF
21. HPC and Data: When Two Becomes One
- Author
-
Christophe Calvin and France Boillod-Cerneux
- Subjects
Open science ,Open data ,Computer simulation ,business.industry ,Computer science ,Science and engineering ,Big data ,Supercomputer ,business ,Data science - Abstract
As claimed for many years, High Performance Computing (HPC) and high performance numerical simulation are necessary tools for fundamental science and engineering. Big data and artificial intelligence are some newcomers in the landscape, but not that new, especially in science. Finally, open data and open science are becoming now mandatory for trustable and reproducible science.
- Published
- 2021
- Full Text
- View/download PDF
22. Turbulence and Interactions : Proceedings of the TI 2018 Conference, June 25-29, 2018, Les Trois-Îlets, Martinique, France
- Author
-
Michel Deville, Christophe Calvin, Vincent Couaillier, Marta De La Llave Plata, Jean-Luc Estivalèzes, Thiên Hiêp Lê, Stéphane Vincent, Michel Deville, Christophe Calvin, Vincent Couaillier, Marta De La Llave Plata, Jean-Luc Estivalèzes, Thiên Hiêp Lê, and Stéphane Vincent
- Subjects
- Fluid mechanics, Continuum mechanics, Dynamics, Nonlinear theories, Thermodynamics, Heat engineering, Heat transfer, Mass transfer
- Abstract
This book presents a snapshot of the state-of-art in the field of turbulence modeling, with an emphasis on numerical methods. Topics include direct numerical simulations, large eddy simulations, compressible turbulence, coherent structures, two-phase flow simulation and many more. It includes both theoretical contributions and experimental works, as well as chapters derived from keynote lectures, presented at the fifth Turbulence and Interactions Conference (TI 2018), which was held on June 25-29 in Martinique, France. This multifaceted collection, which reflects the conference´s emphasis on the interplay of theory, experiments and computing in the process of understanding and predicting the physics of complex flows and solving related engineering problems, offers a timely guide for students, researchers and professionals in the field of applied computational fluid dynamics, turbulence modeling and related areas.
- Published
- 2021
23. Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype
- Author
-
Emeric Brun, Christophe Calvin, Tao Chang, CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Service des Réacteurs et de Mathématiques Appliquées (SERMA), Département de Modélisation des Systèmes et Structures (DM2S), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Maison de la Simulation (MDLS), Université de Versailles Saint-Quentin-en-Yvelines (UVSQ)-Institut National de Recherche en Informatique et en Automatique (Inria)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), and Service d’Études des Réacteurs et de Mathématiques Appliquées (SERMA)
- Subjects
Neutron transport ,[PHYS.NUCL]Physics [physics]/Nuclear Theory [nucl-th] ,Computer science ,pseudo event-based method ,020209 energy ,Monte Carlo method ,history-based method ,OpenMP offload ,CUDA ,02 engineering and technology ,Parallel computing ,Thread (computing) ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Monte Carlo transport ,[PHYS.NEXP]Physics [physics]/Nuclear Experiment [nucl-ex] ,01 natural sciences ,010305 fluids & plasmas ,OpenACC ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Programming paradigm ,OpenMP thread - Abstract
International audience; A heterogeneous offload version of Monte Carlo neutron transport has been developed in the framework of PATMOS prototype viaseveral programming models (OpenMP thread, OpenMP offload, OpenACC and CUDA). Two algorithms are implemented, including both history-based method and pseudo event-based method. A performanceevaluation has been carried out with a representative benchmark, slabAllNuclides. Numerical results illustrate the promising gain in performance for our heterogeneous offload MC code. These results demonstrate that pseudo event-based approach outperforms history-based approach significantly. Furthermore, by using pseudo event-based method, the OpenACC version is competitive enough, obtaining at least 71% performance comparing to the CUDA version, wherein the OpenMP offload version renders low performance for both approaches.
- Published
- 2019
24. The response matrix acceleration: A new non-linear method for the 3D discrete-ordinate transport equation
- Author
-
François Févotte, Emiliano Masiello, Bruno Lathuilière, Wesley Ford, Christophe Calvin, CEA- Saclay (CEA), Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Service d’Études des Réacteurs et de Mathématiques Appliquées (SERMA), Département de Modélisation des Systèmes et Structures (DM2S), CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, EDF (EDF), and This work has been jointly funded by Commissariat a l’Energie Atomique (CEA) and Electricite de France (EDF).
- Subjects
CMFD ,[PHYS]Physics [physics] ,Computer science ,Spectral radius ,non-linear acceleration ,020209 energy ,Finite difference method ,02 engineering and technology ,Solver ,stability analysis ,7. Clean energy ,01 natural sciences ,010305 fluids & plasmas ,Nonlinear system ,Matrix (mathematics) ,Operator (computer programming) ,Nuclear Energy and Engineering ,0103 physical sciences ,Convergence (routing) ,0202 electrical engineering, electronic engineering, information engineering ,Applied mathematics ,Spectral radius analysis ,Convection–diffusion equation ,Discrete-ordinates transport equation - Abstract
International audience; In this paper, we propose a new non-linear technique for accelerating the solution of the discrete ordinates transport equation. The new method, called Response Matrix Acceleration (RMA), has been designed to complement the Coarse-Mesh Finite Difference method (CMFD) by offering better stability and improved performance in cases where CMFD fails. To accomplish this, RMA uses knowledge of the transport operator along with nonlinear coefficients and solves for the interface partial currents to maintain consistency with the transport operator. Two distinct variants of RMA are derived. The convergence properties of both variants of RMA applied the source iteration schemes are investigated for the one-group transport operator. Analysis of the results indicates that both variants of RMA have improved effectiveness and stability relative to CMFD, for optically diffusive materials. To achieve optimal numerical performance, a combination of RMA and CMFD is suggested. Improvements in the performance of RMA are expected with ongoing development and optimization. Further investigation into the use of RMA for accelerating outer iterations, parallel problems, and different transport operators is proposed. The results of a spectral radius analysis are presented, along with a strong scaling benchmark using the 3D C5G7 MOX problems. Furthermore, two real-scale problems, the wholecore EOLE reactor simulation and a PWR assembly simulation, are studied to assess the performances of the new method in a parallel computing framework using the constant and linear short characteristics of the IDT solver in APOLLO3
- Published
- 2020
- Full Text
- View/download PDF
25. A Spatially Variant Rebalancing Method forDiscrete-Ordinates Transport Equation
- Author
-
Wesley Ford, Bruno Lathuilière, Christophe Calvin, Emiliano Masiello, François Févotte, CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Service des Réacteurs et de Mathématiques Appliquées (SERMA), Département de Modélisation des Systèmes et Structures (DM2S), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, EDF (EDF), This work has been jointly funded by Commissariat à l’Energie Atomique (CEA) and Electricité de France (EDF), and Service d’Études des Réacteurs et de Mathématiques Appliquées (SERMA)
- Subjects
[PHYS.NUCL]Physics [physics]/Nuclear Theory [nucl-th] ,020209 energy ,Computation ,Numerical analysis ,Coarse-Mesh Finite Differences ,02 engineering and technology ,Diffusion Synthetic Acceleration ,[PHYS.NEXP]Physics [physics]/Nuclear Experiment [nucl-ex] ,01 natural sciences ,010305 fluids & plasmas ,symbols.namesake ,Acceleration ,Discrete Ordinates Transport Equation ,Nuclear Energy and Engineering ,Method of characteristics ,Fourier analysis ,0103 physical sciences ,Jacobian matrix and determinant ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,Balance equation ,Applied mathematics ,Convection–diffusion equation ,Mathematics ,Coarse-mesh Rebalancing - Abstract
In this paper we propose a new non-linear technique for accelerating the source iterations of the discrete-ordinates transport equation. The acceleration method, called Spatially Variant Rebalancing Method (SVRM), is based on the computation of the zeroth and first order spatial variation of the neutron balance equation. The non-linear acceleration is applied to the method of characteristics (MOC) with a step-approximation of the source. The new acceleration is meant to catch the high-order variation of the neutron flux within the spatial mesh. The paper proposes a numerical analysis of the technique based on the explicit computation of the Jacobian. The latter is analyzed with both spectral and Fourier analysis (Hong et al., 2010). Also, a comparison of the new method against CMFD ( Smith, 1983 ), DSA (Larsen, 1982), and BPA (Adams et al., 1988) has been done for a parametrized heterogeneous problem, in order to study the performance of SVRM in different transport regimes. The analysis of SVRM has been constrained to plane geometries.
- Published
- 2019
- Full Text
- View/download PDF
26. Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo 2013 : Synthése
- Author
-
Cheikh M. Diop and Christophe Calvin
- Subjects
Computer science ,Monte Carlo method ,Joint (building) ,Supercomputer ,Data science ,Computational science - Published
- 2014
- Full Text
- View/download PDF
27. HPC Challenges for Deterministic Neutronics Simulations Using APOLLO3® Code
- Author
-
Christophe Calvin
- Subjects
Neutron transport ,Computer science ,Frame (networking) ,Genetic algorithm ,Code (cryptography) ,Domain decomposition methods ,General Medicine ,Parallel computing ,General-purpose computing on graphics processing units ,Massively parallel ,Boltzmann equation - Abstract
The aim of this paper is to present some major HPC challenges for deterministic neutronics simulations and how these challenges are addressed in the APOLLO3 code. Different levels of HPC are illustrated on different kind of applications and parallel paradigms techniques in the frame of the APOLLO3 code. Results obtained for fuel load management using genetic algorithm, domain decomposition for transport solvers, GPU acceleration for the Boltzmann equation solution are given using from few cores to massively parallel computing using more than 10,000 cores.
- Published
- 2011
- Full Text
- View/download PDF
28. Numerical Platon: A unified linear equation solver interface by CEA for solving open foe scientific applications
- Author
-
Bernard Sécher, Christophe Calvin, and Michel Belliard
- Subjects
Nuclear and High Energy Physics ,Engineering ,business.industry ,Interface (Java) ,Mechanical Engineering ,Computation ,Parallel computing ,Division (mathematics) ,Linear equation solver ,Software ,Nuclear Energy and Engineering ,Coupling (computer programming) ,General Materials Science ,Linear solver ,Safety, Risk, Reliability and Quality ,business ,Waste Management and Disposal ,Massively parallel - Abstract
This paper describes a tool called ‘Numerical Platon’ developed by the French Atomic Energy Commission (CEA). It provides a freely available (GNU LGPL license) interface for coupling scientific computing applications to various freeware linear solver libraries (essentially PETSc, SuperLU and HyPre), together with some proprietary CEA solvers, for high-performance computers that may be used in industrial software written in various programming languages. This tool was developed as part of considerable efforts by the CEA Nuclear Energy Division in the past years to promote massively parallel software and on-shelf parallel tools to help develop new generation simulation codes. After the presentation of the package architecture and the available algorithms, we show examples of how Numerical Platon is used in sequential and parallel CEA codes. Comparing with in-house solvers, the gain in terms of increases in computation capacities or in terms of parallel performances is notable, without considerable extra development cost.
- Published
- 2009
- Full Text
- View/download PDF
29. Intel Xeon/Xeon Phi Platform Oriented Scalable Monte Carlo Linear Solver
- Author
-
Ye, Fan, Christophe, Calvin, Serge, Petiton, Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), and Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[INFO]Computer Science [cs] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2015
30. An object-oriented approach to the design of fluid mechanics software
- Author
-
Philippe Emonot, Christophe Calvin, and Olga Cueto
- Subjects
Numerical Analysis ,Object-oriented programming ,Guiding Principles ,business.industry ,Applied Mathematics ,Fluid mechanics ,Object-oriented design ,Computational Mathematics ,Software ,Development (topology) ,Modeling and Simulation ,Architecture ,Software engineering ,business ,Software architecture ,Analysis ,Simulation ,Mathematics - Abstract
This article presents the guiding principles of the architecture of Trio U, a new genera- tion of software for thermohydraulic calculations. Trio U is designed to serve as a thermohydraulic development platform. Its basic conception is object-oriented and it is written in C++. The article demonstrates how this type of design enables an open, modular software architecture.
- Published
- 2002
- Full Text
- View/download PDF
31. Efficient and Portable Krylov Eigensolver on Many Core Architectures
- Author
-
Serge G. Petiton, F. Ye, F. Boillod-Cerneux, and Christophe Calvin
- Subjects
Petascale computing ,Class (computer programming) ,Many core ,Computer science ,Product (mathematics) ,Computer Science::Mathematical Software ,Scalar (physics) ,Parallel computing ,Solver ,Xeon Phi ,Eigenvalues and eigenvectors - Abstract
We present in this article a highly parallel Krylov solver for large eigenvalue problems, The Explicit Restarted Arnoldi Method (ERAM). Our ERAM implementation may be executed on many core configurations, both homogeneous and heterogeneous ones, in order to take advantage of most of present and future supercomputers. From these experiments, we propose our approach for designing efficient and portable algorithms on multi-core architectures. It is based on the design of generic algorithms using TRILINOS approach and specialized implementation of elementary operations (matrix-matrix, matrix-vector, scalar product ...) on accelerators mentioned above. Some results on large sparse and dense matrices on petascale class machines using CPU and GPUs, and some first results obtained on Intel MIC processor are presented and analysed.
- Published
- 2014
- Full Text
- View/download PDF
32. Multi level programming Paradigm for Extreme Computing
- Author
-
Serge G. Petiton, Mitsuhisa Sato, Christophe Calvin, Nahid Emad, Miwako Tsuji, and Makarem Dandouna
- Subjects
Concurrent object-oriented programming ,Iterative method ,Computer science ,Block (programming) ,Reactive programming ,Programming paradigm ,Service-oriented programming ,Parallel computing ,Functional reactive programming ,Exascale computing - Abstract
In order to propose a framework and programming paradigms for post-petascale computing, on the road to exascale computing and beyond, we introduced new languages, associated with a hierarchical multi-level programming paradigm, allowing scientific end-users and developers to program highly hierarchical architectures designed for extreme computing. In this paper, we explain the interest of such hierarchical multi-level programming paradigm for extreme computing and its well adaptation to several large computational science applications, such as for linear algebra solvers used for reactor core physic. We describe the YML language and framework allowing describing graphs of parallel components, which may be developed using PGAS-like language such as XMP, scheduled and computed on supercomputers. Then, we propose experimentations on supercomputers (such as the “K” and “Hooper” ones) of the hybrid method MERAM (Multiple Explicitly Restarted Arnoldi Method) as a case study for iterative methods manipulating sparse matrices, and the block Gauss-Jordan method as a case study for direct method manipulating dense matrices. We conclude proposing evolutions for this programming paradigm.
- Published
- 2014
- Full Text
- View/download PDF
33. Methods to Overlap Communications in Parallel Numerical Algorithms
- Author
-
Philippe Michallon, Christophe Calvin, and Laurent Colombet
- Subjects
Computer science ,Fast Fourier transform ,Computer Science (miscellaneous) ,Parallel algorithm ,Distributed memory ,Granularity ,Parallel computing ,Algorithm ,Execution time ,Intel Paragon - Abstract
We present in this paper general techniques for overlapping communications in parallel numerical kernels. We describe first some dependencies schemes which can be found in most of numerical parallel algorithms and we apply on these schemes methods based on the change of the granularity of the computational tasks. The choice of the granularity in order to obtain a good overlap depends on the main parameters of the target machines. So we present results of benchmarks executed on two parallel distributed memory machines: a Cray T3D and an Intel Paragon. Then we apply the precedent techniques of overlapping on classical numerical kernels, namely: the matrix-vector and the matrix-matrix products and the mono and bi-dimensional FFT. We have implemented to the overlapped versions of these algorithms on a T3D and a Paragon and tuned the parameters of overlapping in order to minimize the total execution time. The results of these experiments demonstrate the accuracy of this approach.
- Published
- 1997
- Full Text
- View/download PDF
34. Towards exascale with the ANR-JST Japanese-French Project FP3C
- Author
-
Alfredo Buttari, Mitsuhisa Sato, Serge G. Petiton, Nahid Emad, Satoshi Matsuoka, M. Dayde, P. Codognet, Tetsuya Sakurai, Yutaka Ishikawa, Gabriel Antoniu, Christophe Calvin, Raymond Namyst, Taisuke Boku, Hiroshi Nakashima, Kengo Nakajima, and G. Joslin
- Subjects
Runtime system ,Software ,Computer architecture ,Parallel processing (DSP implementation) ,Exploit ,business.industry ,Computer science ,Programming paradigm ,Parallel computing ,Architecture ,business ,Exascale computing - Abstract
The Japanese-french FP3C (Framework and Programming for Post-Petascale Computing) Project ANR/JST-2010-JTIC-003 aims at studying the software technologies, languages and programming models on the road to exascale computing. The ability to efficiency exploit these future systems is challenging because of their ultra large-scale and highly hierarchical architecture with computational nodes including many-core processors and accelerators. We give an overview of some of the main issues explored within the project.
- Published
- 2013
- Full Text
- View/download PDF
35. Accelerating the Explicitly Restarted Arnoldi Method with GPUs Using an Autotuned Matrix Vector Product
- Author
-
Serge G. Petiton, Jérôme Dubois, Christophe Calvin, Commissariat à l’Energie Atomique, Gif-sur-Yvette, France, Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), and Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Multi-core processor ,Speedup ,Computer science ,Applied Mathematics ,Process (computing) ,010103 numerical & computational mathematics ,02 engineering and technology ,Parallel computing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Solver ,01 natural sciences ,020202 computer hardware & architecture ,Computational science ,Arnoldi iteration ,Computational Mathematics ,Matrix (mathematics) ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,0101 mathematics ,Graphics ,ComputingMilieux_MISCELLANEOUS - Abstract
This paper presents a parallelized hybrid single-vector Arnoldi algorithm for computing approximations to eigenpairs of a nonsymmetric matrix. We are interested in the use of accelerators and multicore units to speed up the Arnoldi process. The main goal is to propose a parallel version of the Arnoldi solver, which can efficiently use multiple multicore processors or multiple graphics processing units (GPUs) in a mixed coarse and fine grain fashion. In the proposed algorithms, this is achieved by an autotuning of the matrix vector product before starting the Arnoldi eigensolver as well as the reorganization of the data and global communications so that communication time is reduced. The execution time, performance, and scalability are assessed with well-known dense and sparse test matrices on multiple Nehalems, GT200 NVidia Tesla, and next generation Fermi Tesla. With one processor, we see a performance speedup of 2 to 3x when using all the physical cores, and a total speedup of 2 to 8x when adding a GPU to this multicore unit, and hence a speedup of 4 to 24x compared to the sequential solver.
- Published
- 2011
- Full Text
- View/download PDF
36. High Performance Computing in Nuclear Engineering
- Author
-
Christophe Calvin and David Nowak
- Published
- 2010
- Full Text
- View/download PDF
37. All-to-all broadcast in torus with wormhole-like routing
- Author
-
S. Perennes, Christophe Calvin, and Denis Trystram
- Subjects
Transformation (function) ,Dimension (vector space) ,Computer science ,Value (computer science) ,Torus ,Parallel computing ,Routing (electronic design automation) ,Wormhole ,Topology ,Square (algebra) ,Power (physics) - Abstract
This paper deals with collective communications on distributed-memory parallel machines. We are interested in the design of efficient all-to-all broadcast algorithms on square torus of processing nodes using wormhole-like routing mechanism. The execution time is influenced by three factors, namely, the number of steps, the transmission rate and the maximum distance to cross. We first compute the lower bounds of the all-to-all broadcast problem under these assumptions. Then, we propose a new algorithm which minimizes the number of steps. Its distance factor is close to the optimal, but the transmission rate is too large. We derive a transformation which reduces significantly this last factor. This value is close to the optimum. This lost algorithm is a good trade-off when the message length is not too large. This analysis is detailed for square sizes of tori when the dimension is a power of 5. We show how to extend the construction for ether sizes of square tori.
- Published
- 2002
- Full Text
- View/download PDF
38. The trio-unitaire project: A parallel CFD 3-dimensional code
- Author
-
Ph. Emonot and Christophe Calvin
- Subjects
Structure (mathematical logic) ,Object-oriented programming ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,business.industry ,Computer science ,Code (cryptography) ,Parallelism (grammar) ,Domain decomposition methods ,Parallel computing ,Computational fluid dynamics ,business ,Algorithm ,Cray t3e - Abstract
The structure of a new generation of thermalhydraulic code: Trio- Unitaire is presented in this paper. This code has been designed to solve large 3D structured or unstructured CFD problems. The solutions adopted to achieve this goal (object-oriented design and parallelism) are described and the paper focuses on the technical solutions used. Some preliminary experimental results on a Cray T3E are presented.
- Published
- 1997
- Full Text
- View/download PDF
39. Overlapping techniques of communications
- Author
-
Laurent Colombet, Philippe Michallon, and Christophe Calvin
- Subjects
Computer science ,Order (business) ,Product (mathematics) ,Fast Fourier transform ,Parallel algorithm ,Algorithm - Abstract
We present in this paper general techniques for overlapping communications in parallel numerical kernels. We describe first some dependencies schemes which can be found in most of numerical parallel algorithms and we apply on these schemes methods based on the change of the granularity of the computational tasks. The choice of the granularity in order to obtain a good overlap depends on the main parameters of the target machines. We apply the precedent techniques of overlapping on classical numerical kernels, namely: the matrix-vector product and the bi-dimensional FFT, and implemented them on a T3D and a Paragon. The results of these experiments demonstrate the accuracy of this approach.
- Published
- 1995
- Full Text
- View/download PDF
40. Evaluation of programming models for manycore and / or heterogeneous architectures for Monte Carlo neutron transport codes
- Author
-
Chang, Tao, Département de Modélisation des Systèmes et Structures (DM2S), CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, Institut Polytechnique de Paris, and Christophe Calvin
- Subjects
Manycore architectures ,Architectures manycore ,Transport des particules ,Heterogeneous architectures ,Particules transport ,[SCCO.COMP]Cognitive science/Computer science ,Architectures hétérogènes - Abstract
In this thesis we propose to evaluate the different programming models available for addressing manycore and / or heterogeneous architectures within the framework of the Monte Carlo transport codes. A simple but representative application test case will be considered in order to cover a fairly wide range of solutions and compare them in terms of performance, portability of performance, ease of implementation and maintainability. The target architectures are `classic' CPUs, Intel Xeon Phi and GPUs. The most relevant programming models will then be set up in a Monte Carlo transport code.; Dans cette thèse nous nous proposons d’évaluer les différents modèles de programmation disponibles pour adresser les architectures de type manycore et/ou hétérogènes dans le cadre des codes de transport Monte Carlo. On considèrera dans un premier temps un cas test d’application simple mais représentatif pour couvrir un éventail assez large de solutions et les comparer en terme de performance, de portabilité de la performance, de facilité de mise en œuvre et de maintenabilité. Les architectures cibles sont les CPU `classique', Intel Xeon Phi et GPU. Les modèles de programmation les plus pertinents seront ensuite mis en place dans un code de transport Monte Carlo.
- Published
- 2020
41. Evaluation de modèles de programmation pour les architectures manycore et/ou hétérogènes pour les codes de transport neutronique Monte Carlo
- Author
-
Chang, Tao, Département de Modélisation des Systèmes et Structures (DM2S), CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, Institut Polytechnique de Paris, and Christophe Calvin
- Subjects
Manycore architectures ,Architectures manycore ,Transport des particules ,Heterogeneous architectures ,Particules transport ,[SCCO.COMP]Cognitive science/Computer science ,Architectures hétérogènes - Abstract
In this thesis we propose to evaluate the different programming models available for addressing manycore and / or heterogeneous architectures within the framework of the Monte Carlo transport codes. A simple but representative application test case will be considered in order to cover a fairly wide range of solutions and compare them in terms of performance, portability of performance, ease of implementation and maintainability. The target architectures are `classic' CPUs, Intel Xeon Phi and GPUs. The most relevant programming models will then be set up in a Monte Carlo transport code.; Dans cette thèse nous nous proposons d’évaluer les différents modèles de programmation disponibles pour adresser les architectures de type manycore et/ou hétérogènes dans le cadre des codes de transport Monte Carlo. On considèrera dans un premier temps un cas test d’application simple mais représentatif pour couvrir un éventail assez large de solutions et les comparer en terme de performance, de portabilité de la performance, de facilité de mise en œuvre et de maintenabilité. Les architectures cibles sont les CPU `classique', Intel Xeon Phi et GPU. Les modèles de programmation les plus pertinents seront ensuite mis en place dans un code de transport Monte Carlo.
- Published
- 2020
42. The Advancement of Stable, Efficient and Parallel Acceleration Methods for the Neutron Transport Equation
- Author
-
Ford, Wesley, STAR, ABES, Département de Modélisation des Systèmes et Structures (DM2S), CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, Université Paris Saclay (COmUE), and Christophe Calvin
- Subjects
Equation de transport en coordonnées discrètes ,Analyse de la stabilité ,Cmfd ,Analyse du rayon spectral ,[SPI.NRJ]Engineering Sciences [physics]/Electric power ,[PHYS.PHYS.PHYS-COMP-PH] Physics [physics]/Physics [physics]/Computational Physics [physics.comp-ph] ,Stability analysis ,Accélération non linéaire ,[PHYS.PHYS.PHYS-COMP-PH]Physics [physics]/Physics [physics]/Computational Physics [physics.comp-ph] ,[MATH.MATH-MP]Mathematics [math]/Mathematical Physics [math-ph] ,Non-linear acceleration ,Spectral radius analysis ,[MATH.MATH-MP] Mathematics [math]/Mathematical Physics [math-ph] ,Discrete-ordinates transport equation ,[SPI.NRJ] Engineering Sciences [physics]/Electric power - Abstract
In this paper we propose a new library of non-linear techniques for accelerating the discrete-ordinates transport equation. Two new types of nonlinear acceleration methods called Spatially Variant Rebalancing Method (SVRM) and Response Matrix Acceleration (RMA), respectively, are proposed and investigated. The first method, SVRM, is based on the computation of the zeroth and first order spatial variation of the neutron balance equation. RMA, is a DP0 method that uses knowledge of the transport operator to form a consistent relationship. Two distinct variants of RMA, called Explicit-RMA (E-RMA) and Balance (B-RMA), respectively, are derived. The convergence properties of both acceleration methods are investigated for two different iteration schemes of the method of characteristics (MOC) transport operator for a 1D slab, using spectral and Fourier analysis. Based off the results of the 1D comparison, only RMA and CMFD were implemented in the library. The performance of RMA is compared to CMFD using the C5G7, ZPPR, and UH12 3D benchmarks. Both parallel and sequential solving schemes are considered. Analysis of the results indicates that both variants of RMA have improved effectiveness and stability relative to CMFD, for optically diffusive materials. Moreover, RMA shows great improvement in stability and effectiveness when the geometry is spatially decomposed. To achieve optimal numerical performance, a combination of RMA and CMFD is suggested. Further investigation into the use and improvement of RMA is proposed. As well, many ideas for extending the features of the library are presented., Dans cet article, nous proposons une nouvelle bibliothèque de techniques non linéaires pour accélérer l’équation de transport en ordonnées discrètes. Deux nouveaux types de méthodes d'accélération non linéaire appelées méthode de rééquilibrage spatialement variable (SVRM) et accélération de matrice de réponse (RMA), respectivement, sont proposées et étudiées. La première méthode, SVRM, est basée sur le calcul de la variation spatiale de premier ordre de l'équation de la balance des neutrons. RMA, est une méthode DP0 qui utilise la connaissance de l'opérateur de transport pour former une relation cohérente. Deux variantes distinctes de RMA, appelées respectivement Explicit-RMA (E-RMA) et Balance (B-RMA), sont dérivées. Les propriétés de convergence des deux méthodes d'accélération sont étudiées pour deux schémas d'itération différents de l'opérateur de transport de la méthode des caractéristiques (MOC) pour une dalle 1D, en utilisant une analyse spectrale et une analyse de Fourier. Sur la base des résultats de la comparaison 1D, seuls les outils RMA et CMFD ont été implémentés dans la bibliothèque. Les performances de RMA sont comparées à celles de CMFD en utilisant les tests 3D C5G7, ZPPR et UH12. Les schémas de résolution parallèles et séquentiels sont considérés. L'analyse des résultats indique que les deux variantes de RMA ont une efficacité et une stabilité améliorées par rapport au CMFD, pour les matériaux à diffusion optique. De plus, le RMA montre une amélioration importante de la stabilité et de l'efficacité lorsque la géométrie est décomposée spatialement. Pour obtenir des performances numériques optimales, une combinaison de RMA et de CMFD est suggérée. Une enquête plus approfondie sur l'utilisation et l'amélioration de la RMA est proposée. De plus, de nombreuses idées pour étendre les fonctionnalités de la bibliothèque sont présentées.
- Published
- 2019
43. Optimization of Monte Carlo Neutron Transport Simulations with Emerging Architectures
- Author
-
Wang, Yunsong, Maison de la Simulation (MDLS), Université de Versailles Saint-Quentin-en-Yvelines (UVSQ)-Institut National de Recherche en Informatique et en Automatique (Inria)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Université Paris Saclay (COmUE), Christophe Calvin, and Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Versailles Saint-Quentin-en-Yvelines (UVSQ)
- Subjects
Parallel computing ,Neutron transport ,Cross section ,Vectorisation ,Vectorization ,Transport de neutron ,Section efficace ,Mic ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Programmation Parallèle ,Monte Carlo - Abstract
Monte Carlo (MC) neutron transport simulations are widely used in the nuclear community to perform reference calculations with minimal approximations. The conventional MC method has a slow convergence according to the law of large numbers, which makes simulations computationally expensive. Cross section computation has been identified as the major performance bottleneck for MC neutron code. Typically, cross section data are precalculated and stored into memory before simulations for each nuclide, thus during the simulation, only table lookups are required to retrieve data from memory and the compute cost is trivial. We implemented and optimized a large collection of lookup algorithms in order to accelerate this data retrieving process. Results show that significant speedup can be achieved over the conventional binary search on both CPU and MIC in unit tests other than real case simulations. Using vectorization instructions has been proved effective on many-core architecture due to its 512-bit vector units; on CPU this improvement is limited by a smaller register size. Further optimization like memory reduction turns out to be very important since it largely improves computing performance. As can be imagined, all proposals of energy lookup are totally memory-bound where computing units does little things but only waiting for data. In another word, computing capability of modern architectures are largely wasted. Another major issue of energy lookup is that the memory requirement is huge: cross section data in one temperature for up to 400 nuclides involved in a real case simulation requires nearly 1 GB memory space, which makes simulations with several thousand temperatures infeasible to carry out with current computer systems.In order to solve the problem relevant to energy lookup, we begin to investigate another on-the-fly cross section proposal called reconstruction. The basic idea behind the reconstruction, is to do the Doppler broadening (performing a convolution integral) computation of cross sections on-the-fly, each time a cross section is needed, with a formulation close to standard neutron cross section libraries, and based on the same amount of data. The reconstruction converts the problem from memory-bound to compute-bound: only several variables for each resonance are required instead of the conventional pointwise table covering the entire resolved resonance region. Though memory space is largely reduced, this method is really time-consuming. After a series of optimizations, results show that the reconstruction kernel benefits well from vectorization and can achieve 1806 GFLOPS (single precision) on a Knights Landing 7250, which represents 67% of its effective peak performance. Even if optimization efforts on reconstruction significantly improve the FLOP usage, this on-the-fly calculation is still slower than the conventional lookup method. Under this situation, we begin to port the code on GPGPU to exploit potential higher performance as well as higher FLOP usage. On the other hand, another evaluation has been planned to compare lookup and reconstruction in terms of power consumption: with the help of hardware and software energy measurement support, we expect to find a compromising solution between performance and energy consumption in order to face the "power wall" challenge along with hardware evolution.; L’accès aux données de base, que sont les sections efficaces, constitue le principal goulot d’étranglement aux performances dans la résolution des équations du transport neutronique par méthode Monte Carlo (MC). Ces sections efficaces caractérisent les probabilités de collisions des neutrons avec les nucléides qui composent le matériau traversé. Elles sont propres à chaque nucléide et dépendent de l’énergie du neutron incident et de la température du matériau. Les codes de référence en MC chargent ces données en mémoire à l’ensemble des températures intervenant dans le système et utilisent un algorithme de recherche binaire dans les tables stockant les sections. Sur les architectures many-coeurs (typiquement Intel MIC), ces méthodes sont dramatiquement inefficaces du fait des accès aléatoires à la mémoire qui ne permettent pas de profiter des différents niveaux de cache mémoire et du manque de vectorisation de ces algorithmes.Tout le travail de la thèse a consisté, dans une première partie, à trouver des alternatives à cet algorithme de base en proposant le meilleur compromis performances/occupation mémoire qui tire parti des spécificités du MIC (multithreading et vectorisation). Dans un deuxième temps, nous sommes partis sur une approche radicalement opposée, approche dans laquelle les données ne sont pas stockées en mémoire, mais calculées à la volée. Toute une série d’optimisations de l’algorithme, des structures de données, vectorisation, déroulement de boucles et influence de la précision de représentation des données, ont permis d’obtenir des gains considérables par rapport à l’implémentation initiale.En fin de compte, une comparaison a été effectué entre les deux approches (données en mémoire et données calculées à la volée) pour finalement proposer le meilleur compromis en termes de performance/occupation mémoire. Au-delà de l'application ciblée (le transport MC), le travail réalisé est également une étude qui peut se généraliser sur la façon de transformer un problème initialement limité par la latence mémoire (« memory latency bound ») en un problème qui sature le processeur (« CPU-bound ») et permet de tirer parti des architectures many-coeurs.
- Published
- 2017
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.