Back to Search Start Over

Tarazu: An Adaptive End-to-end I/O Load-balancing Framework for Large-scale Parallel File Systems.

Authors :
Paul, Arnab K.
Neuwirth, Sarah
Wadhwa, Bharti
Wang, Feiyi
Oral, Sarp
Butt, Ali R.
Source :
ACM Transactions on Storage; May2024, Vol. 20 Issue 2, p1-42, 42p
Publication Year :
2024

Abstract

The imbalanced I/O load on large parallel file systems affects the parallel I/O performance of high-performance computing (HPC) applications. One of the main reasons for I/O imbalances is the lack of a global view of system-wide resource consumption. While approaches to address the problem already exist, the diversity of HPC workloads combined with different file striping patterns prevents widespread adoption of these approaches. In addition, load-balancing techniques should be transparent to client applications. To address these issues, we propose Tarazu, an end-to-end control plane where clients transparently and adaptively write to a set of selected I/O servers to achieve balanced data placement. Our control plane leverages real-time load statistics for global data placement on distributed storage servers, while our design model employs trace-based optimization techniques to minimize latency for I/O load requests between clients and servers and to handle multiple striping patterns in files. We evaluate our proposed system on an experimental cluster for two common use cases: the synthetic I/O benchmark IOR and the scientific application I/O kernel HACC-I/O. We also use a discrete-time simulator with real HPC application traces from emerging workloads running on the Summit supercomputer to validate the effectiveness and scalability of Tarazu in large-scale storage environments. The results show improvements in load balancing and read performance of up to 33% and 43%, respectively, compared to the state-of-the-art. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15533077
Volume :
20
Issue :
2
Database :
Complementary Index
Journal :
ACM Transactions on Storage
Publication Type :
Academic Journal
Accession number :
177112438
Full Text :
https://doi.org/10.1145/3641885