1. HDF5 in the exascale era: Delivering efficient and scalable parallel I/O for exascale applications.
- Author
-
Scot Breitenfeld, M, Tang, Houjun, Zheng, Huihuo, Henderson, Jordan, and Byna, Suren
- Subjects
- *
DATA warehousing , *SIMULATION software , *LIBRARY cooperation , *PERFORMANCE management , *DATA management - Abstract
Accurately modeling real-world systems requires scientific applications at exascale to generate massive amounts of data and manage data storage efficiently. However, parallel input and output (I/O) faces challenges due to new application workflows and the state-of-the-art memory, interconnect, and storage architectures considered in exascale designs. The storage hierarchy has expanded with node-local persistent memory, solid-state storage, and traditional disk and tape-based storage, thus requiring efficiency at each layer and much more efficient data movement among these layers. This paper discusses how the ExaHDF5 project improved the I/O performance and data management for exascale architectures by enhancing HDF5, a widely used parallel I/O library. The team developed an Asynchronous I/O Virtual Object Layer (VOL) connector that allowed overlapping I/O with computation. They also created a Cache VOL to complement asynchronous I/O by incorporating fast storage layers, such as burst buffer and node-local storage, into the parallel I/O workflow through caching and staging data. Additionally, the team enabled data aggregation and I/O at the node level by using a Subfiling Virtual File Driver (VFD). To demonstrate superior I/O performance with HDF5 at exascale, the ExaHDF5 team collaborated with several exascale applications. In this paper, we show I/O performance improvements for three applications: Cabana (a particle-based simulation library), EQSIM (a regional earthquake simulation software), and E3SM (a climate system modeling library). [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF