1. Statistical Modeling of HPC Performance Variability and Communication
- Author
-
Dominguez-Trujillo, Jered B
- Subjects
- Exascale, HPC, Performance Model, Performance Variability, Partitioned Communication, Extreme Value Theory, Statistics, MPI, Computer Sciences, Numerical Analysis and Scientific Computing, OS and Networks
- Abstract
Understanding the performance of parallel and distributed programs remains a focal point in determining how compute systems can be optimized to achieve exascale performance. Lightweight, statistical models allow developers to both characterize and predict performance trade-offs, especially as HPC systems become more heterogeneous with many-core CPUs and GPUs. This thesis presents a lightweight, statistical modeling approach of performance variation which leverages extreme value theory by focusing on the maximum length of distributed workload intervals. This approach was implemented in MPI and evaluated on several HPC systems and workloads. I then present a performance model of partitioned communication which also uses an expected maximum value method. This performance model was validated with benchmarked results from HPC systems. These lightweight, statistical models provide insight into the behavior of HPC applications and systems and allow developers to predict performance impacts as HPC systems evolve towards exascale.
- Published
- 2021