544 results on '"Exascale"'
Search Results
2. SUNDIALS time integrators for exascale applications with many independent systems of ordinary differential equations.
- Author
-
Balos, Cody J, Day, Marcus, Esclapez, Lucas, Felden, Anne M, Gardner, David J, Hassanaly, Malik, Reynolds, Daniel R, Rood, Jon S, Sexton, Jean M, Wimer, Nicholas T, and Woodward, Carol S
- Subjects
- *
PARTIAL differential equations , *INITIAL value problems , *PARTIAL differential operators , *COMBUSTION kinetics , *COMPUTATIONAL fluid dynamics - Abstract
Many complex systems can be accurately modeled as a set of coupled time-dependent partial differential equations (PDEs). However, solving such equations can be prohibitively expensive, easily taxing the world's largest supercomputers. One pragmatic strategy for attacking such problems is to split the PDEs into components that can more easily be solved in isolation. This operator splitting approach is used ubiquitously across scientific domains, and in many cases leads to a set of ordinary differential equations (ODEs) that need to be solved as part of a larger "outer-loop" time-stepping approach. The SUNDIALS library provides a plethora of robust time integration algorithms for solving ODEs, and the U.S. Department of Energy Exascale Computing Project (ECP) has supported its extension to applications on exascale-capable computing hardware. In this paper, we highlight some SUNDIALS capabilities and its deployment in combustion and cosmology application codes (Pele and Nyx, respectively) where operator splitting gives rise to numerous, small ODE systems that must be solved concurrently. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
3. Refining HPCToolkit for application performance analysis at exascale.
- Author
-
Adhianto, Laksono, Anderson, Jonathon, Barnett, Robert Matthew, Grbic, Dragana, Indic, Vladimir, Krentel, Mark, Liu, Yumeng, Milaković, Srđan, Phan, Wileam, and Mellor-Crummey, John
- Subjects
- *
GRAPHICS processing units , *GRAPHICAL user interfaces , *SOURCE code , *OPEN scholarship , *INFORMATION resources - Abstract
As part of the US Department of Energy's Exascale Computing Project (ECP), Rice University has been refining its HPCToolkit performance tools to better support measurement and analysis of applications executing on exascale supercomputers. To efficiently collect performance measurements of GPU-accelerated applications, HPCToolkit employs novel non-blocking data structures to communicate performance measurements between tool threads and application threads. To attribute performance information in detail to source lines, loop nests, and inlined call chains, HPCToolkit performs parallel analysis of large CPU and GPU binaries involved in the execution of an exascale application to rapidly recover mappings between machine instructions and source code. To analyze terabytes of performance measurements gathered during executions at exascale, HPCToolkit employs distributed-memory parallelism, multithreading, sparse data structures, and out-of-core streaming analysis algorithms. To support interactive exploration of profiles up to terabytes in size, HPCToolkit's hpcviewer graphical user interface uses out-of-core methods to visualize performance data. The result of these efforts is that HPCToolkit now supports collection, analysis, and presentation of profiles and traces of GPU-accelerated applications at exascale. These improvements have enabled HPCToolkit to efficiently measure, analyze and explore terabytes of performance data for executions using as many as 64K MPI ranks and 64K GPU tiles on ORNL's Frontier supercomputer. HPCToolkit's support for measurement and analysis of GPU-accelerated applications has been employed to study a collection of open-science applications developed as part of ECP. This paper reports on these experiences, which provided insight into opportunities for tuning applications, strengths and weaknesses of HPCToolkit itself, as well as unexpected behaviors in executions at exascale. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Visualization at exascale: Making it all work with VTK-m.
- Author
-
Moreland, Kenneth, Athawale, Tushar M, Bolea, Vicente, Bolstad, Mark, Brugger, Eric, Childs, Hank, Huebl, Axel, Lo, Li-Ta, Geveci, Berk, Marsaglia, Nicole, Philip, Sujin, Pugmire, David, Rizzi, Silvio, Wang, Zhe, and Yenpure, Abhishek
- Subjects
- *
SOFTWARE libraries (Computer programming) , *PARTICLE acceleration , *SCIENTIFIC visualization , *COMPUTER software development , *SCIENTIFIC discoveries - Abstract
The VTK-m software library enables scientific visualization on exascale-class supercomputers. Exascale machines are particularly challenging for software development in part because they use GPU accelerators to provide the vast majority of their computational throughput. Algorithmic designs for GPUs and GPU-centric computing often deviate from those that worked well on previous generations of high-performance computers that relied on traditional CPUs. Fortunately, VTK-m provides scientific visualization algorithms for GPUs and other accelerators. VTK-m also provides a framework that simplifies the implementation of new algorithms and adds a porting layer to work across multiple processor types. This paper describes the main challenges encountered when making scientific visualization available at exascale. We document the surprises and obstacles faced when moving from pre-exascale platforms to the final exascale designs and the performance on those systems including scaling studies on Frontier, an exascale machine with over 37,000 AMD GPUs. We also report on the integration of VTK-m with other exascale software technologies. Finally, we show how VTK-m helps scientific discovery for applications such as fusion and particle acceleration that leverage an exascale supercomputer. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. The Monte Carlo Computational Summit – October 25 & 26, 2023 – Notre Dame, Indiana, USA.
- Author
-
Morgan, Joanna Piper, Mote, Alexander, Pasmann, Samuel Lee, Ridley, Gavin, Palmer, Todd S., and Niemeyer, Kyle E.
- Subjects
- *
RAPID response teams , *BENCHMARK problems (Computer science) , *NEUTRONS , *ALGORITHMS , *COMPUTER software - Abstract
The Monte Carlo Computational Summit was held on the campus of the University of Notre Dame in South Bend, Indiana, USA on 25–26 October 2023. The goals of the summit were to discuss algorithmic and software alterations required for successfully porting respective code bases to exascale-class computing hardware, compare software engineering techniques used by various code teams, and consider the adoption of industry-standard benchmark problems to better facilitate code-to-code performance comparisons. Participants reported that identifying and implementing suitable Monte Carlo algorithms for GPUs continues to be a sticking point. They also report significant difficulty porting existing algorithms between GPU APIs (specifically Nvidia CUDA to AMD ROCm). To better compare code-to-code performance, participants decided to design a C5G7-like benchmark problem with a defined figure of merit, with the expectation of adding more benchmarks in the future. The participants also identified the need to explore the intermediate and long-term future of the Monte Carlo neutron transport community and how best to modernize and contextualize Monte Carlo as a useful tool in modern industry. Overall the summit was considered to be a success by the organizers and participants, and the group shared a strong desire for future, potentially larger, Monte Carlo summits. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. To Exascale and Beyond—The Simple Cloud‐Resolving E3SM Atmosphere Model (SCREAM), a Performance Portable Global Atmosphere Model for Cloud‐Resolving Scales.
- Author
-
Donahue, A. S., Caldwell, P. M., Bertagna, L., Beydoun, H., Bogenschutz, P. A., Bradley, A. M., Clevenger, T. C., Foucar, J., Golaz, C., Guba, O., Hannah, W., Hillman, B. R., Johnson, J. N., Keen, N., Lin, W., Singh, B., Sreepathi, S., Taylor, M. A., Tian, J., and Terai, C. R.
- Subjects
- *
ATMOSPHERIC models , *ATMOSPHERIC radiation measurement , *HIGH performance computing , *CLIMATE change models , *GRAPHICS processing units , *COMPUTER systems , *HETEROGENEOUS computing - Abstract
The new generation of heterogeneous CPU/GPU computer systems offer much greater computational performance but are not yet widely used for climate modeling. One reason for this is that traditional climate models were written before GPUs were available and would require an extensive overhaul to run on these new machines. In addition, even conventional "high–resolution" simulations don't currently provide enough parallel work to keep GPUs busy, so the benefits of such overhaul would be limited for the types of simulations climate scientists are accustomed to. The vision of the Simple Cloud‐Resolving Energy Exascale Earth System (E3SM) Atmosphere Model (SCREAM) project is to create a global atmospheric model with the architecture to efficiently use GPUs and horizontal resolution sufficient to fully take advantage of GPU parallelism. After 5 years of model development, SCREAM is finally ready for use. In this paper, we describe the design of this new code, its performance on both CPU and heterogeneous machines, and its ability to simulate real‐world climate via a set of four 40 day simulations covering all 4 seasons of the year. Plain Language Summary: This paper describes the design and development of a 3 km version of the Energy Exascale Earth System Model (E3SM) atmosphere model, which has been fully rewritten in C++ using the Kokkos library for performance portability. This newly rewritten model is able to take advantage of the state–of–the–science high performance computing systems which use graphical processor units (GPUs) to mitigate much of the computational expense which typically plagues high–resolution global modeling. Taking advantage of this high–performance we are able to run four seasons of simulations at 3 km global resolution. We discuss the biases, including the diurnal cycle, by comparing model results with satellite and Atmospheric Radiation Measurement ground‐based site data. Key Points: Describes the C++/Kokkos implementation of the Simple Cloud–Resolving E3SM Atmosphere Model (SCREAMv1)SCREAMv1 leverages GPUs to surpass one simulated year per compute day at global 3 km resolutionHigh resolution improves some meso‐scale features and the diurnal cycle but large‐scale biases require improvement across all four seasons [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Towards exascale for wind energy simulations.
- Author
-
Min, Misun, Brazell, Michael, Tomboulides, Ananias, Churchfield, Matthew, Fischer, Paul, and Sprague, Michael
- Abstract
We examine large-eddy-simulation modeling approaches and computational performance of two open-source computational fluid dynamics codes for the simulation of atmospheric boundary layer flows that are of direct relevance to wind energy production. The first code, NekRS, is a high-order, unstructured-grid, spectral element code. The second code, AMR-Wind, is a second-order, block-structured, finite-volume code with adaptive mesh refinement capabilities. The objective of this study is to co-develop these codes in order to improve model fidelity and performance for each. These features will be critical for running ABL-based applications such as wind farm analysis on advanced computing architectures. To this end, we investigate the performance of NekRS and AMR-Wind on the Oak Ridge Leadership Facility supercomputers Summit, using 4 to 800 nodes (24 to 4,800 NVIDIA V100 GPUs), and Crusher, the testbed for the Frontier exascale system, using 18 to 384 Graphics Compute Dies on AMD MI250X GPUs. We compare strong- and weak-scaling capabilities, linear solver performance, and time to solution. We also identify leading inhibitors to parallel scaling. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. ExaFEL: extreme-scale real-time data processing for X-ray free electron laser science
- Author
-
Johannes P. Blaschke, Robert Bolotovsky, Aaron S. Brewster, Jeffrey Donatelli, Antoine DuJardin, Wu-chun Feng, Vidya Ganapati, Wilko Kroeger, Derek Mendez, Peter McCorquodale, Seema Mirchandaney, Christopher P. O'Grady, Daniel W. Paley, Amedeo Perazzo, Frederic P. Poitevin, Billy K. Poon, Vinay B. Ramakrishnaiah, Nicholas K. Sauter, Niteya Shah, Elliott Slaughter, Christine Sweeney, Daniel Tchoń, Monarin Uervirojnangkoorn, Felix Wittwer, Michael E. Wall, Chun Hong Yoon, and Iris D. Young
- Subjects
exascale ,Single Particle Imaging ,Serial Femtosecond Crystallography ,hardware acceleration ,data-intensive ,interfacility ,Computer software ,QA76.75-76.765 - Abstract
ExaFEL is an HPC-capable X-ray Free Electron Laser (XFEL) data analysis software suite for both Serial Femtosecond Crystallography (SFX) and Single Particle Imaging (SPI) developed in collaboration with the Linac Coherent Lightsource (LCLS), Lawrence Berkeley National Laboratory (LBNL) and Los Alamos National Laboratory. ExaFEL supports real-time data analysis via a cross-facility workflow spanning LCLS and HPC centers such as NERSC and OLCF. Our work therefore constitutes initial path-finding for the US Department of Energy's (DOE) Integrated Research Infrastructure (IRI) program. We present the ExaFEL team's 7 years of experience in developing real-time XFEL data analysis software for the DOE's exascale supercomputers. We present our experiences and lessons learned with the Perlmutter and Frontier supercomputers. Furthermore we outline essential data center services (and the implications for institutional policy) required for real-time data analysis. Finally we summarize our software and performance engineering approaches and our experiences with NERSC's Perlmutter and OLCF's Frontier systems. This work is intended to be a practical blueprint for similar efforts in integrating exascale compute resources into other cross-facility workflows.
- Published
- 2024
- Full Text
- View/download PDF
9. ExaWorks software development kit: a robust and scalable collection of interoperable workflows technologies
- Author
-
Matteo Turilli, Mihael Hategan-Marandiuc, Mikhail Titov, Ketan Maheshwari, Aymen Alsaadi, Andre Merzky, Ramon Arambula, Mikhail Zakharchanka, Matt Cowan, Justin M. Wozniak, Andreas Wilke, Ozgur Ozan Kilic, Kyle Chard, Rafael Ferreira da Silva, Shantenu Jha, and Daniel Laney
- Subjects
scientific workflow ,software development kit ,high-performance computing ,exascale ,testing ,documentation ,Computer software ,QA76.75-76.765 - Abstract
Scientific discovery increasingly requires executing heterogeneous scientific workflows on high-performance computing (HPC) platforms. Heterogeneous workflows contain different types of tasks (e.g., simulation, analysis, and learning) that need to be mapped, scheduled, and launched on different computing. That requires a software stack that enables users to code their workflows and automate resource management and workflow execution. Currently, there are many workflow technologies with diverse levels of robustness and capabilities, and users face difficult choices of software that can effectively and efficiently support their use cases on HPC machines, especially when considering the latest exascale platforms. We contributed to addressing this issue by developing the ExaWorks Software Development Kit (SDK). The SDK is a curated collection of workflow technologies engineered following current best practices and specifically designed to work on HPC platforms. We present our experience with (1) curating those technologies, (2) integrating them to provide users with new capabilities, (3) developing a continuous integration platform to test the SDK on DOE HPC platforms, (4) designing a dashboard to publish the results of those tests, and (5) devising an innovative documentation platform to help users to use those technologies. Our experience details the requirements and the best practices needed to curate workflow technologies, and it also serves as a blueprint for the capabilities and services that DOE will have to offer to support a variety of scientific heterogeneous workflows on the newly available exascale HPC platforms.
- Published
- 2024
- Full Text
- View/download PDF
10. Managed Network Services for Exascale Data Movement Across Large Global Scientific Collaborations
- Author
-
Würthwein, Frank, Guiang, Jonathan, Arora, Aashay, Davila, Diego, Graham, John, Mishin, Dima, Hutton, Thomas, Sfiligoi, Igor, Newman, Harvey, Balcas, Justas, Lehman, Tom, Yang, Xi, and Guok, Chin
- Subjects
Distributed Computing and Systems Software ,Information and Computing Sciences ,Vaccine Related ,exascale ,data distribution ,software defined networking - Abstract
Unique scientific instruments designed and operated by large global collaborations are expected to produce Exabytescale data volumes per year by 2030. These collaborations depend on globally distributed storage and compute to turn raw data into science. While all of these infrastructures have batch scheduling capabilities to share compute, Research and Education networks lack those capabilities. There is thus uncontrolled competition for bandwidth between and within collaborations. As a result, data 'hogs' disk space at processing facilities for much longer than it takes to process, leading to vastly over-provisioned storage infrastructures. Integrated co-scheduling of networks as part of high-level managed workflows might reduce these storage needs by more than an order of magnitude. This paper describes such a solution, demonstrates its functionality in the context of the Large Hadron Collider (LHC) at CERN, and presents the nextsteps towards its use in production.
- Published
- 2022
11. PanDA: Production and Distributed Analysis System
- Author
-
Maeno, Tadashi, Alekseev, Aleksandr, Barreiro Megino, Fernando Harald, De, Kaushik, Guan, Wen, Karavakis, Edward, Klimentov, Alexei, Korchuganova, Tatiana, Lin, FaHui, Nilsson, Paul, Wenaus, Torre, Yang, Zhaoyu, and Zhao, Xin
- Published
- 2024
- Full Text
- View/download PDF
12. Integrating End-to-End Exascale SDN into the LHC Data Distribution Cyberinfrastructure
- Author
-
Guiang, Jonathan, Arora, Aashay, Davila, Diego, Graham, John, Mishin, Dima, Sfiligoi, Igor, Wuerthwein, Frank, Lehman, Tom, Yang, Xi, Guok, Chin, Newman, Harvey, Balcas, Justas, and Hutton, Thomas
- Subjects
Nuclear and Plasma Physics ,Information and Computing Sciences ,Physical Sciences ,exascale ,data distribution ,software defined networking - Abstract
The Compact Muon Solenoid (CMS) experiment at the CERN Large Hadron Collider (LHC) distributes its data by leveraging a diverse array of National Research and Education Networks (NRENs), which CMS is forced to treat as an opaque resource. Consequently, CMS sees highly variable performance that already poses a challenge for operators coordinating the movement of petabytes around the globe. This kind of unpredictability, however, threatens CMS with a logistical nightmare as it barrels towards the High Luminosity LHC (HL-LHC) era in 2030, which is expected to produce roughly 0.5 exabytes of data per year. This paper explores one potential solution to this issue: software-defined networking (SDN). In particular, the prototypical interoperation of SENSE, an SDN product developed by the Energy Sciences Network, with Rucio, the data management software used by the LHC, is outlined. In addition, this paper presents the current progress in bringing these technologies together.
- Published
- 2022
13. MFIX-Exa: A path toward exascale CFD-DEM simulations
- Author
-
Musser, Jordan, Almgren, Ann S, Fullmer, William D, Antepara, Oscar, Bell, John B, Blaschke, Johannes, Gott, Kevin, Myers, Andrew, Porcu, Roberto, Rangarajan, Deepak, Rosso, Michele, Zhang, Weiqun, and Syamlal, Madhava
- Subjects
Information and Computing Sciences ,Applied Computing ,CFD-DEM ,MFIX ,AMReX ,HPC ,exascale ,ECP ,embedded boundaries ,multiphase ,method-of-lines ,Distributed Computing ,Applied computing ,Distributed computing and systems software - Abstract
MFIX-Exa is a computational fluid dynamics–discrete element model (CFD-DEM) code designed to run efficiently on current and next-generation supercomputing architectures. MFIX-Exa combines the CFD-DEM expertise embodied in the MFIX code—which was developed at NETL and is used widely in academia and industry—with the modern software framework, AMReX, developed at LBNL. The fundamental physics models follow those of the original MFIX, but the combination of new algorithmic approaches and a new software infrastructure will enable MFIX-Exa to leverage future exascale machines to optimize the modeling and design of multiphase chemical reactors.
- Published
- 2022
14. The NOMAD mini-apps: A suite of kernels from ab initio electronic structure codes enabling co-design in high-performance computing [version 2; peer review: 2 approved, 1 not approved]
- Author
-
Rogeli Grima Torres, Isidre Mas Magre, José Julio Gutierrez Moreno, and José María Cela Espín
- Subjects
High-performance computing ,exascale ,ab initio calculations ,materials science ,eigensolver library ,Green's function methods ,eng ,Science ,Social Sciences - Abstract
This article introduces a suite of mini-applications (mini-apps) designed to optimise computational kernels in ab initio electronic structure codes. The suite is developed from flagship applications participating in the NOMAD Center of Excellence, such as the ELPA eigensolver library and the GW implementations of the exciting, Abinit, and FHI-aims codes. The mini-apps were identified by targeting functions that significantly contribute to the total execution time in the parent applications. This strategic selection allows for concentrated optimisation efforts. The suite is designed for easy deployment on various High-Performance Computing (HPC) systems, supported by an integrated CMake build system for straightforward compilation and execution. The aim is to harness the capabilities of emerging (post)exascale systems, which necessitate concurrent hardware and software development — a concept known as co-design. The mini-app suite serves as a tool for profiling and benchmarking, providing insights that can guide both software optimisation and hardware design. Ultimately, these developments will enable more accurate and efficient simulations of novel materials, leveraging the full potential of exascale computing in material science research.
- Published
- 2024
- Full Text
- View/download PDF
15. A survey of compute nodes with 100 TFLOPS and beyond for supercomputers
- Author
-
Chang, Junsheng, Lu, Kai, Guo, Yang, Wang, Yongwen, Zhao, Zhenyu, Huang, Libo, Zhou, Hongwei, Wang, Yao, Lei, Fei, and Zhang, Biwei
- Published
- 2024
- Full Text
- View/download PDF
16. Hierarchical Management of Extreme-Scale Task-Based Applications
- Author
-
Lordan, Francesc, Puigdemunt, Gabriel, Vergés, Pere, Conejero, Javier, Ejarque, Jorge, Badia, Rosa M., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cano, José, editor, Dikaiakos, Marios D., editor, Papadopoulos, George A., editor, Pericàs, Miquel, editor, and Sakellariou, Rizos, editor
- Published
- 2023
- Full Text
- View/download PDF
17. Hitos del supercómputo: del Giga al Exascale
- Author
-
Alfredo Santillán González and Liliana Hernández-Cervantes
- Subjects
Supercómputo ,HPC ,Exascale ,Hitos del Supercómputo ,Technology ,Science - Abstract
En junio de 2022 se presentó la supercomputadora más potente del mundo, FRONTIER del Laboratorio Nacional Oak Ridge, USA, capaz de realizar, por primera vez, más de un trillón de operaciones de punto flotante por segundo. Este gran acontecimiento nos motivó a realizar un análisis de la panorámica del supercómputo mundial, en cuatro momentos históricos; el primero corresponde al inicio del proyecto TOP500 que clasifica a las 500 supercomputadoras más poderosas del planeta, y los otros tres corresponden a las épocas en que se superaron los rendimientos de los TFlop/s, PFlop/s y EFlop/s; a través de cuatro indicadores que están vinculados al desarrollo científico, tecnológico y académico de una nación. El primero se refiere al número de equipos de supercómputo por país; el segundo al poder de cómputo por región; el tercero y cuarto, al número de supercomputadoras y el rendimiento ocupado por sector: industria, investigación, académico, gobierno y otro.
- Published
- 2023
- Full Text
- View/download PDF
18. HPC-enabling technologies for high-fidelity combustion simulations.
- Author
-
Mira, Daniel, Pérez-Sánchez, Eduardo J., Borrell, Ricard, and Houzeaux, Guillaume
- Abstract
With the increase in computational power in the last decade and the forthcoming Exascale supercomputers, a new horizon in computational modelling and simulation is envisioned in combustion science. Considering the multiscale and multiphysics characteristics of turbulent reacting flows, combustion simulations are considered as one of the most computationally demanding applications running on cutting-edge supercomputers. Exascale computing opens new frontiers for the simulation of combustion systems as more realistic conditions can be achieved with high-fidelity methods. However, an efficient use of these computing architectures requires methodologies that can exploit all levels of parallelism. The efficient utilization of the next generation of supercomputers needs to be considered from a global perspective, that is, involving physical modelling and numerical methods with methodologies based on High-Performance Computing (HPC) and hardware architectures. This review introduces recent developments in numerical methods for large-eddy simulations (LES) and direct-numerical simulations (DNS) to simulate combustion systems, with focus on the computational performance and algorithmic capabilities. Due to the broad scope, a first section is devoted to describe the fundamentals of turbulent combustion, which is followed by a general description of state-of-the-art computational strategies for solving these problems. These applications require advanced HPC approaches to exploit modern supercomputers, which is addressed in the third section. The increasing complexity of new computing architectures, with tightly coupled CPUs and GPUs, as well as high levels of parallelism, requires new parallel models and algorithms exposing the required level of concurrency. Advances in terms of dynamic load balancing, vectorization, GPU acceleration and mesh adaptation have permitted to achieve highly-efficient combustion simulations with data-driven methods in HPC environments. Therefore, dedicated sections covering the use of high-order methods for reacting flows, integration of detailed chemistry and two-phase flows are addressed. Final remarks and directions of future work are given at the end. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. Exascale applications: skin in the game
- Author
-
Alexander, Francis, Almgren, Ann, Bell, John, Bhattacharjee, Amitava, Chen, Jacqueline, Colella, Phil, Daniel, David, DeSlippe, Jack, Diachin, Lori, Draeger, Erik, Dubey, Anshu, Dunning, Thom, Evans, Thomas, Foster, Ian, Francois, Marianne, Germann, Tim, Gordon, Mark, Habib, Salman, Halappanavar, Mahantesh, Hamilton, Steven, Hart, William, Huang, Zhenyu, Hungerford, Aimee, Kasen, Daniel, Kent, Paul RC, Kolev, Tzanio, Kothe, Douglas B, Kronfeld, Andreas, Luo, Ye, Mackenzie, Paul, McCallen, David, Messer, Bronson, Mniszewski, Sue, Oehmen, Chris, Perazzo, Amedeo, Perez, Danny, Richards, David, Rider, William J, Rieben, Rob, Roche, Kenneth, Siegel, Andrew, Sprague, Michael, Steefel, Carl, Stevens, Rick, Syamlal, Madhava, Taylor, Mark, Turner, John, Vay, Jean-Luc, Voter, Artur F, Windus, Theresa L, and Yelick, Katherine
- Subjects
Information and Computing Sciences ,Human-Centred Computing ,Data Science ,Affordable and Clean Energy ,exascale ,high-performance computing ,computational science applications ,numerical algorithms ,machine learning ,modelling and simulation ,General Science & Technology - Abstract
As noted in Wikipedia, skin in the game refers to having 'incurred risk by being involved in achieving a goal', where 'skin is a synecdoche for the person involved, and game is the metaphor for actions on the field of play under discussion'. For exascale applications under development in the US Department of Energy Exascale Computing Project, nothing could be more apt, with the skin being exascale applications and the game being delivering comprehensive science-based computational applications that effectively exploit exascale high-performance computing technologies to provide breakthrough modelling and simulation and data science solutions. These solutions will yield high-confidence insights and answers to the most critical problems and challenges for the USA in scientific discovery, national security, energy assurance, economic competitiveness and advanced healthcare. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.
- Published
- 2020
20. Response of HPC hardware to neutron radiation at the dawn of exascale.
- Author
-
Bustos, Andrés, Rubio-Montero, Antonio Juan, Méndez, Roberto, Rivera, Sergio, González, Francisco, Campo, Xandra, Asorey, Hernán, and Mayo-García, Rafael
- Subjects
- *
BACKGROUND radiation , *IONIZING radiation , *SERVER farms (Computer network management) , *NEUTRON flux , *RADIATION , *HIGH performance computing - Abstract
Every computation presents a small chance that an unexpected phenomenon ruins or modifies its output. Computers are prone to errors that, although may be very unlikely, are hard, expensive or simply impossible to avoid. In the exascale, with thousands of processors involved in a single computation, those errors are especially harmful because they can corrupt or distort the results, wasting human and material resources. In the present work, we study the effect of ionizing radiation on several pieces of commercial hardware, very common in modern supercomputers. Aiming to reproduce the natural radiation that could arise, CPUs (Xeon, EPYC) and GPUs (A100, V100, T4) are subject to a known flux of neutrons coming from two radioactive sources, namely 252 Cf and 241 Am-Be, in a special irradiation facility. The working hardware is irradiated under supervision to quantify any appearing error. Once the hardware response is characterised, we are able to scale down the radiation intensity and to estimate the effects on standard data centres. This can help administrators and researchers to develop their contingency plans and protocols. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Modeling Round-Off Errors in Hydrodynamic Simulations
- Author
-
Weens, William, Vazquez-Gonzalez, Thibaud, Salem-Knapp, Louise Ben, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bloem, Roderick, editor, Dimitrova, Rayna, editor, Fan, Chuchu, editor, and Sharygina, Natasha, editor
- Published
- 2022
- Full Text
- View/download PDF
22. A UPC++ Actor Library and Its Evaluation On a Shallow Water Proxy Application
- Author
-
Pöppl, Alexander, Baden, Scott, and Bader, Michael
- Subjects
Information and Computing Sciences ,Library and Information Studies ,Actor-based computation ,tsunami simulation ,programming models ,PGAS ,Exascale ,UPC++ - Abstract
Programmability is one of the key challenges of Exascale Computing. Using the actor model for distributed computations may be one solution. The actor model separates computation from communication while still enabling their over-lap. Each actor possesses specified communication endpoints to publish and receive information. Computations are undertaken based on the data available on these channels. We present a library that implements this programming model using UPC++, a PGAS library, and evaluate three different parallelization strategies, one based on rank-sequential execution, one based on multiple threads in a rank, and one based on OpenMP tasks. In an evaluation of our library using shallow water proxy applications, our solution compares favorably against an earlier implementation based on X10, and a BSP-based approach.
- Published
- 2019
23. A UPC++ Actor Library and Its Evaluation on a Shallow Water Proxy Application
- Author
-
Pppl, A, Baden, S, and Bader, M
- Subjects
Actor-based computation ,Exascale ,PGAS ,programming models ,tsunami simulation ,UPC++ - Abstract
Programmability is one of the key challenges of Exascale Computing. Using the actor model for distributed computations may be one solution. The actor model separates computation from communication while still enabling their over-lap. Each actor possesses specified communication endpoints to publish and receive information. Computations are undertaken based on the data available on these channels. We present a library that implements this programming model using UPC++, a PGAS library, and evaluate three different parallelization strategies, one based on rank-sequential execution, one based on multiple threads in a rank, and one based on OpenMP tasks. In an evaluation of our library using shallow water proxy applications, our solution compares favorably against an earlier implementation based on X10, and a BSP-based approach.
- Published
- 2019
24. UPC++: A High-Performance Communication Framework for Asynchronous Computation
- Author
-
Bachan, J, Baden, S, Hofmeyr, S, Jacquelin, M, Kamil, A, Bonachea, D, Hargrove, P, and Ahmed, H
- Subjects
Asynchronous ,Exascale ,PGAS ,RMA ,RPC ,Information and Computing Sciences ,Architecture ,Built Environment and Design - Abstract
UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC). We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x. UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.
- Published
- 2019
25. Calculation of the high-energy neutron flux for anticipating errors and recovery techniques in exascale supercomputer centres.
- Author
-
Asorey, Hernán and Mayo-García, Rafael
- Subjects
- *
NEUTRON flux , *MEAN time between failure , *SINGLE event effects , *ATMOSPHERIC radiation , *NEUTRON temperature - Abstract
The age of exascale computing has arrived, and the risks associated with neutron and other atmospheric radiation are becoming more critical as the computing power increases; hence, the expected mean time between failures will be reduced because of this radiation. In this work, a new and detailed calculation of the neutron flux for energies above 50 MeV is presented. This has been done by using state-of-the-art Monte Carlo astroparticle techniques and including real atmospheric profiles at each one of the next 23 exascale supercomputing facilities. Atmospheric impact in the flux and seasonal variations were observed and characterized, and the barometric coefficient for high-energy neutrons at each site was obtained. With these coefficients, potential risks of errors associated with the increase in the flux of energetic neutrons, such as the occurrence of single event upsets or transients, and the corresponding failure-in-time rates, can be anticipated just by using the atmospheric pressure before the assignation of resources to critical tasks at each exascale facility. For more clarity, examples about how the rate of failures is affected by the cosmic rays are included, so administrators will better anticipate which more or less restrictive actions could take for overcoming errors. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. Towards Ab-Initio Simulations of Crystalline Defects at the Exascale Using Spectral Quadrature Density Functional Theory
- Author
-
Swarnava Ghosh
- Subjects
spectral quadrature ,density functional theory ,defects ,exascale ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Defects in crystalline solids play a crucial role in determining properties of materials at the nano, meso- and macroscales, such as the coalescence of vacancies at the nanoscale to form voids and prismatic dislocation loops or diffusion and segregation of solutes to nucleate precipitates, phase transitions in magnetic materials via disorder and doping. First principles Density Functional Theory (DFT) simulations can provide a detailed understanding of these phenomena. However, the number of atoms needed to correctly simulate these systems is often beyond the reach of many widely used DFT codes. The aim of this article is to discuss recent advances in first principles modeling of crystal defects using the spectral quadrature method. The spectral quadrature method is linear scaling with respect to the number of atoms, permits spatial coarse-graining, and is capable of simulating non-periodic systems embedded in a bulk environment, which allows the application of appropriate boundary conditions for simulations of crystalline defects. In this article, we discuss the state-of-the-art in ab-initio modeling of large metallic systems of the order of several thousand atoms that are suitable for utilizing exascale computing resourses.
- Published
- 2022
- Full Text
- View/download PDF
27. Challenges in High-Performance Computing.
- Author
-
Alexandre Navaux, Philippe Olivier, Francisco Lorenzon, Arthur, and da Silva Serpa, Matheus
- Subjects
ARTIFICIAL intelligence ,SUPERCOMPUTERS ,COMPUTER science ,PROCESS capability ,CHOICE (Psychology) ,COMPUTER architecture - Abstract
High-Performance Computing, HPC, has become one of the most active computer science fields. Driven mainly by the need for high processing capabilities required by algorithms from many areas, such as Big Data, Artificial Intelligence, Data Science, and subjects related to chemistry, physics, and biology, the state-of-art algorithms from these fields are notoriously demanding computer resources. Therefore, choosing the right computer system to optimize their performance is paramount. This article presents the main challenges of future supercomputer systems, highlighting the areas that demand the most of HPC servers; the new architectures, including heterogeneous processors composed of artificial intelligence chips, quantum processors, the adoption of HPC on cloud servers; and the challenges of software developers when facing parallelizing applications. We also discuss challenges regarding non-functional requirements, such as energy consumption and resilience. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. Pre‐exascale HPC approaches for molecular dynamics simulations. Covid‐19 research: A use case.
- Author
-
Wieczór, Miłosz, Genna, Vito, Aranda, Juan, Badia, Rosa M., Gelpí, Josep Lluís, Gapsys, Vytautas, de Groot, Bert L., Lindahl, Erik, Municoy, Martí, Hospital, Adam, and Orozco, Modesto
- Subjects
MOLECULAR dynamics ,COVID-19 pandemic ,COVID-19 ,ALGORITHMS ,STATISTICAL mechanics ,COMMUNITIES ,HIGH performance computing - Abstract
Exascale computing has been a dream for ages and is close to becoming a reality that will impact how molecular simulations are being performed, as well as the quantity and quality of the information derived for them. We review how the biomolecular simulations field is anticipating these new architectures, making emphasis on recent work from groups in the BioExcel Center of Excellence for High Performance Computing. We exemplified the power of these simulation strategies with the work done by the HPC simulation community to fight Covid‐19 pandemics. This article is categorized under:Data Science > Computer Algorithms and ProgrammingData Science > Databases and Expert SystemsMolecular and Statistical Mechanics > Molecular Dynamics and Monte‐Carlo Methods [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. Towards Ab-Initio Simulations of Crystalline Defects at the Exascale Using Spectral Quadrature Density Functional Theory.
- Author
-
Ghosh, Swarnava
- Subjects
CRYSTALS ,COLLOIDS ,MAGNETIC materials ,PHASE transitions ,DENSITY functional theory - Abstract
Defects in crystalline solids play a crucial role in determining properties of materials at the nano, meso- and macroscales, such as the coalescence of vacancies at the nanoscale to form voids and prismatic dislocation loops or diffusion and segregation of solutes to nucleate precipitates, phase transitions in magnetic materials via disorder and doping. First principles Density Functional Theory (DFT) simulations can provide a detailed understanding of these phenomena. However, the number of atoms needed to correctly simulate these systems is often beyond the reach of many widely used DFT codes. The aim of this article is to discuss recent advances in first principles modeling of crystal defects using the spectral quadrature method. The spectral quadrature method is linear scaling with respect to the number of atoms, permits spatial coarse-graining, and is capable of simulating non-periodic systems embedded in a bulk environment, which allows the application of appropriate boundary conditions for simulations of crystalline defects. In this article, we discuss the state-of-the-art in ab-initio modeling of large metallic systems of the order of several thousand atoms that are suitable for utilizing exascale computing resourses. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. GPU-enabled extreme-scale turbulence simulations: Fourier pseudo-spectral algorithms at the exascale using OpenMP offloading.
- Author
-
Yeung, P.K., Ravikumar, Kiran, Nichols, Stephen, and Uma-Vaideswaran, Rohini
- Subjects
- *
FAST Fourier transforms , *NAVIER-Stokes equations , *NONLINEAR differential equations , *FLUID dynamics , *PARTIAL differential equations - Abstract
Fourier pseudo-spectral methods for nonlinear partial differential equations are of wide interest in many areas of advanced computational science, including direct numerical simulation of three-dimensional (3-D) turbulence governed by the Navier-Stokes equations in fluid dynamics. This paper presents a new capability for simulating turbulence at a new record resolution up to 35 trillion grid points, on the world's first exascale computer, Frontier , comprising AMD MI250x GPUs with HPE's Slingshot interconnect and operated by the US Department of Energy's Oak Ridge Leadership Computing Facility (OLCF). Key programming strategies designed to take maximum advantage of the machine architecture involve performing almost all computations on the GPU which has the same memory capacity as the CPU, performing all-to-all communication among sets of parallel processes directly on the GPU, and targeting GPUs efficiently using OpenMP offloading for intensive number-crunching including 1-D Fast Fourier Transforms (FFT) performed using AMD ROCm library calls. With 99% of computing power on Frontier being on the GPU, leaving the CPU idle leads to a net performance gain via avoiding the overhead of data movement between host and device except when needed for some I/O purposes. Memory footprint including the size of communication buffers for MPI_ALLTOALL is managed carefully to maximize the largest problem size possible for a given node count. Detailed performance data including separate contributions from different categories of operations to the elapsed wall time per step are reported for five grid resolutions, from 20483 on a single node to 327683 on 4096 or 8192 nodes out of 9408 on the system. Both 1D and 2D domain decompositions which divide a 3D periodic domain into slabs and pencils respectively are implemented. The present code suite (labeled by the acronym GESTS, GPUs for Extreme Scale Turbulence Simulations) achieves a figure of merit (in grid points per second) exceeding goals set in the Center for Accelerated Application Readiness (CAAR) program for Frontier. The performance attained is highly favorable in both weak scaling and strong scaling, with notable departures only for 20483 where communication is entirely intra-node, and for 327683, where a challenge due to small message sizes does arise. Communication performance is addressed further using a lightweight test code that performs all-to-all communication in a manner matching the full turbulence simulation code. Performance at large problem sizes is affected by both small message size due to high node counts as well as dragonfly network topology features on the machine, but is consistent with official expectations of sustained performance on Frontier. Overall, although not perfect, the scalability achieved at the extreme problem size of 327683 (and up to 8192 nodes — which corresponds to hardware rated at just under 1 exaflop/sec of theoretical peak computational performance) is arguably better than the scalability observed using prior state-of-the-art algorithms on Frontier's predecessor machine (Summit) at OLCF. New science results for the study of intermittency in turbulence enabled by this code and its extensions are to be reported separately in the near future. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
31. Exploring versioned distributed arrays for resilience in scientific applications
- Author
-
Chien, A, Balaji, P, Dun, N, Fang, A, Fujita, H, Iskra, K, Rubenstein, Z, Zheng, Z, Hammond, J, Laguna, I, Richards, D, Dubey, A, van Straalen, B, Hoemmen, M, Heroux, M, Teranishi, K, and Siegel, A
- Subjects
Information and Computing Sciences ,Software Engineering ,Resilience ,fault-tolerance ,exascale ,scalable computing ,application-based fault tolerance ,Distributed Computing ,Applied computing ,Distributed computing and systems software - Abstract
Exascale studies project reliability challenges for future HPC systems. We present the Global View Resilience (GVR) system, a library for portable resilience. GVR begins with a subset of the Global Arrays interface, and adds new capabilities to create versions, name versions, and compute on version data. Applications can focus versioning where and when it is most productive, and customize for each application structure independently. This control is portable, and its embedding in application source makes it natural to express and easy to maintain. The ability to name multiple versions and “partially materialize” them efficiently makes ambitious forward-recovery based on “data slices” across versions or data structures both easy to express and efficient. Using several large applications (OpenMC, preconditioned conjugate gradient (PCG) solver, ddcMD, and Chombo), we evaluate the programming effort to add resilience. The required changes are small (< 2% lines of code (LOC)), localized and machine-independent, and perhaps most important, require no software architecture changes. We also measure the overhead of adding GVR versioning and show that overheads < 2% are generally achieved. This overhead suggests that GVR can be implemented in large-scale codes and support portable error recovery with modest investment and runtime impact. Our results are drawn from both IBM BG/Q and Cray XC30 experiments, demonstrating portability. We also present two case studies of flexible error recovery, illustrating how GVR can be used for multi-version rollback recovery, and several different forward-recovery schemes. GVR’s multi-version enables applications to survive latent errors (silent data corruption) with significant detection latency, and forward recovery can make that recovery extremely efficient. Our results suggest that GVR is scalable, portable, and efficient. GVR interfaces are flexible, supporting a variety of recovery schemes, and altogether GVR embodies a gentle-slope path to tolerate growing error rates in future extreme-scale systems.
- Published
- 2017
32. Big Data and HPC Convergence for Smart Infrastructures: A Review and Proposed Architecture
- Author
-
Usman, Sardar, Mehmood, Rashid, Katib, Iyad, Chlamtac, Imrich, Series Editor, Mehmood, Rashid, editor, See, Simon, editor, and Katib, Iyad, editor
- Published
- 2020
- Full Text
- View/download PDF
33. Running a Pre-exascale, Geographically Distributed, Multi-cloud Scientific Simulation
- Author
-
Sfiligoi, Igor, Würthwein, Frank, Riedel, Benedikt, Schultz, David, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sadayappan, Ponnuswamy, editor, Chamberlain, Bradford L., editor, Juckeland, Guido, editor, and Ltaief, Hatem, editor
- Published
- 2020
- Full Text
- View/download PDF
34. heFFTe: Highly Efficient FFT for Exascale
- Author
-
Ayala, Alan, Tomov, Stanimire, Haidar, Azzam, Dongarra, Jack, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Krzhizhanovskaya, Valeria V., editor, Závodszky, Gábor, editor, Lees, Michael H., editor, Dongarra, Jack J., editor, Sloot, Peter M. A., editor, Brissos, Sérgio, editor, and Teixeira, João, editor
- Published
- 2020
- Full Text
- View/download PDF
35. Enabling EASEY Deployment of Containerized Applications for Future HPC Systems
- Author
-
Höb, Maximilian, Kranzlmüller, Dieter, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Krzhizhanovskaya, Valeria V., editor, Závodszky, Gábor, editor, Lees, Michael H., editor, Dongarra, Jack J., editor, Sloot, Peter M. A., editor, Brissos, Sérgio, editor, and Teixeira, João, editor
- Published
- 2020
- Full Text
- View/download PDF
36. Run-Time Exploitation of Application Dynamism for Energy-Efficient Exascale Computing
- Author
-
Kjeldsberg, Per Gunnar, Schöne, Robert, Gerndt, Michael, Riha, Lubomir, Kannan, Venkatesh, Diethelm, Kai, Sawley, Marie-Christine, Zapletal, Jan, Gocht, Andreas, Reissmann, Nico, Vysocky, Ondrej, Kumaraswamy, Madhura, Nagel, Wolfgang E., Catthoor, Francky, Basten, Twan, Zompakis, Nikolaos, Geilen, Marc, and Kjeldsberg, Per Gunnar
- Published
- 2020
- Full Text
- View/download PDF
37. A codesign framework for online data analysis and reduction.
- Author
-
Mehta, Kshitij, Allen, Bryce, Wolf, Matthew, Logan, Jeremy, Suchyta, Eric, Singhal, Swati, Choi, Jong Y., Takahashi, Keichi, Huck, Kevin, Yakushin, Igor, Sussman, Alan, Munson, Todd, Foster, Ian, and Klasky, Scott
- Subjects
DATA reduction ,DATA analysis ,HIGH performance computing ,REQUIREMENTS engineering ,CHEETAH ,SAVANNAS - Abstract
Science applications preparing for the exascale era are increasingly exploring in situ computations comprising of simulation‐analysis‐reduction pipelines coupled in‐memory. Efficient composition and execution of such complex pipelines for a target platform is a codesign process that evaluates the impact and tradeoffs of various application‐ and system‐specific parameters. In this article, we describe a toolset for automating performance studies of composed HPC applications that perform online data reduction and analysis. We describe Cheetah, a new framework for composing parametric studies on coupled applications, and Savanna, a runtime engine for orchestrating and executing campaigns of codesign experiments. This toolset facilitates understanding the impact of various factors such as process placement, synchronicity of algorithms, and storage versus compute requirements for online analysis of large data. Ultimately, we aim to create a catalog of performance results that can help scientists understand tradeoffs when designing next‐generation simulations that make use of online processing techniques. We illustrate the design of Cheetah and Savanna, and present application examples that use this framework to conduct codesign studies on small clusters as well as leadership class supercomputers. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. IO-SEA: Storage I/O and Data Management for Exascale Architectures
- Author
-
Araújo De Medeiros, Daniel, Markidis, Stefano, Denier, Philippe, et al., Araújo De Medeiros, Daniel, Markidis, Stefano, Denier, Philippe, and et al.
- Abstract
The new emerging scientific workloads to be executed in the upcoming exascale supercomputers face major challenges in terms of storage, given their extreme volume of data. In particular, intelligent data placement, instrumentation, and workflow handling are central to application performance. The IO-SEA project developed multiple solutions to aid the scientific community in adressing these challenges: a Workflow Manager, a hierarchical storage management system, and a semantic API for storage. All of these major products incorporate additional minor products that support their mission. In this paper, we discuss both the roles of all these products and how they can assist the scientific community in achieving exascale performance., QC 20240822
- Published
- 2024
- Full Text
- View/download PDF
39. SOD2D: A GPU-enabled spectral finite elements method for compressible scale-resolving simulations
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Ciència i Tecnologia Aeroespacials, barcelona Supercomputing Center, Silva, Lucas Gasparino Ferreira da, Spiga, Filippo, Lehmkuhl Barba, Oriol, Universitat Politècnica de Catalunya. Doctorat en Ciència i Tecnologia Aeroespacials, barcelona Supercomputing Center, Silva, Lucas Gasparino Ferreira da, Spiga, Filippo, and Lehmkuhl Barba, Oriol
- Abstract
As new supercomputer architectures become more heavily focused on using hardware accelerators, in particular general-purpose graphical processors, it is therefore relevant that algorithms for computational fluid dynamics, especially those targeting scale-resolving simulations, be designed in such a way as to make efficient use of such hardware. In this paper, we propose one such hardware-accelerated Continuous Galerkin Finite Elements model, aimed at handling simulations of turbulent compressible flows over complex geometries. As this model is intended for use in Large-Eddy and Direct Numerical simulations, it is necessary that the resulting scheme introduces only small amounts of artificial (numerical) diffusion for stabilization purposes. We achieve this through a combination of Spectral Finite Elements, operator splittings on the convective term, and use of the Entropy Viscosity stabilization model adapted to spectral elements scheme. The paper will present the resulting algorithm, how it is made to work efficiently on the accelerators, and results obtained so far that demonstrate its high-accuracy capabilities.As new supercomputer architectures become more heavily focused on using hardware accelerators, in particular general-purpose graphical processors, it is therefore relevant that algorithms for computational fluid dynamics, especially those targeting scale-resolving simulations, be designed in such a way as to make efficient use of such hardware. In this paper, we propose one such hardware-accelerated Continuous Galerkin Finite Elements model, aimed at handling simulations of turbulent compressible flows over complex geometries. As this model is intended for use in Large-Eddy and Direct Numerical simulations, it is necessary that the resulting scheme introduces only small amounts of artificial (numerical) diffusion for stabilization purposes. We achieve this through a combination of Spectral Finite Elements, operator splittings on the convective term, and us, This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 956104 and co-founded by the Spanish Agencia Estatal de Investigación (AEI) under grant agreement PCI2021-121962. O. Lehmkuhl work is financed by a Ramón y Cajal postdoctoral contract by the Ministerio de Economía y Competitividad, Secretaría de Estado de Investigación, Desarrollo e Innovación, Spain (RYC2018-025949-I). Lucas Gasparino has received financial support from ’la Caixa’ Foundation (ID 100010434). The fellowship grant code is LCF/BQ/DI18/11660051., Peer Reviewed, Postprint (published version)
- Published
- 2024
40. Large-scale large eddy simulation of nuclear reactor flows: Issues and perspectives
- Author
-
Yu, Yiqi [Argonne National Lab. (ANL), Argonne, IL (United States). Nuclear Engineering Division]
- Published
- 2016
- Full Text
- View/download PDF
41. Exploring versioned distributed arrays for resilience in scientific applications: Global view resilience
- Author
-
Siegel, Andrew [Argonne National Lab. (ANL), Argonne, IL (United States)]
- Published
- 2016
- Full Text
- View/download PDF
42. Fault Modeling of Extreme Scale Applications Using Machine Learning
- Author
-
Hoisie, Adolfy [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)]
- Published
- 2016
- Full Text
- View/download PDF
43. A Survey on Malleability Solutions for High-Performance Distributed Computing.
- Author
-
Aliaga, Jose I., Castillo, Maribel, Iserte, Sergio, Martín-Álvarez, Iker, and Mayo, Rafael
- Subjects
MESSAGE passing (Computer science) ,RESOURCE management ,UNITS of time - Abstract
Maintaining a high rate of productivity, in terms of completed jobs per unit of time, in High-Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale supercomputers. Process malleability is presented as a straightforward mechanism to address that issue. Nowadays, the vast majority of HPC facilities are intended for distributed-memory applications based on the Message Passing (MP) paradigm. For this reason, many efforts are based on the Message Passing Interface (MPI), the de facto standard programming model. Malleability aims to rescale executions on-the-fly, in other words, reconfigure the number and layout of processes in running applications. Process malleability involves resources reallocation within the HPC system, handling processes of the application, and redistributing data among those processes to resume the execution. This manuscript compiles how different frameworks address process malleability, their main features, their integration in resource management systems, and how they may be used in user codes. This paper is a detailed state-of-the-art devised as an entry point for researchers who are interested in process malleability. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. Kokkos 3: Programming Model Extensions for the Exascale Era.
- Author
-
Trott, Christian R., Lebrun-Grandie, Damien, Arndt, Daniel, Ciesko, Jan, Dang, Vinh, Ellingwood, Nathan, Gayatri, Rahulkumar, Harvey, Evan, Hollman, Daisy S., Ibanez, Dan, Liber, Nevin, Madsen, Jonathan, Miles, Jeff, Poliakoff, David, Powell, Amy, Rajamanickam, Sivasankaran, Simberg, Mikael, Sunderland, Dan, Turcksin, Bruno, and Wilke, Jeremiah
- Subjects
- *
HETEROGENEOUS computing , *GRAPHICS processing units - Abstract
As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high-performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. We describe the novel abstractions that have been added to Kokkos version 3 such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations to prepare for exascale era architectures. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. FieldView-VisIt: A Modern Engineering Post-Processing System for Ultra-Scale Physics Based Simulations
- Author
-
Duque, Earl [JMSI Inc., Rutherford, NJ (United States). Intelligent Light]
- Published
- 2017
46. Final Report, “Exploiting Global View for Resilience”
- Author
-
Chien, Andrew [Univ. of Chicago, IL (United States)]
- Published
- 2017
- Full Text
- View/download PDF
47. Optimized FPGA Implementation of a Compute-Intensive Oil Reservoir Simulation Algorithm
- Author
-
Ioannou, Aggelos D., Malakonakis, Pavlos, Georgopoulos, Konstantinos, Papaefstathiou, Ioannis, Dollas, Apostolos, Mavroidis, Iakovos, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Pnevmatikatos, Dionisios N., editor, Pelcat, Maxime, editor, and Jung, Matthias, editor
- Published
- 2019
- Full Text
- View/download PDF
48. Scaling Productivity and Innovation on the Path to Exascale with a 'Team of Teams' Approach
- Author
-
Raybourn, Elaine M., Moulton, J. David, Hungerford, Aimee, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nah, Fiona Fui-Hoon, editor, and Siau, Keng, editor
- Published
- 2019
- Full Text
- View/download PDF
49. Toward Efficient Architecture-Independent Algorithms for Dynamic Programs
- Author
-
Javanmard, Mohammad Mahdi, Ganapathi, Pramod, Das, Rathish, Ahmad, Zafar, Tschudi, Stephen, Chowdhury, Rezaul, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Weiland, Michèle, editor, Juckeland, Guido, editor, Trinitis, Carsten, editor, and Sadayappan, Ponnuswamy, editor
- Published
- 2019
- Full Text
- View/download PDF
50. Performance Evaluation and Analysis of Linear Algebra Kernels in the Prototype Tianhe-3 Cluster
- Author
-
You, Xin, Yang, Hailong, Luan, Zhongzhi, Liu, Yi, Qian, Depei, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Abramson, David, editor, and de Supinski, Bronis R., editor
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.