Back to Search Start Over

Job Characteristics on Large-Scale Systems: Long-Term Analysis, Quantification, and Implications

Authors :
Devesh Tiwari
Raj Kettimuthu
Paul Rich
Tirthak Patel
Zhengchun Liu
William Allcock
Source :
SC
Publication Year :
2020
Publisher :
IEEE, 2020.

Abstract

HPC workload analysis and resource consumption characteristics are the key to driving better operation practices, system procurement decisions, and designing effective resource management techniques. Unfortunately, the HPC community does not have easy accessibility to long-term introspective work-load analysis and characterization for production-scale HPC systems. This study bridges this gap by providing detailed long-term quantification, characterization, and analysis of job characteristics on two supercomputers: Intrepid and Mira. This study is one of the largest of its kind – covering trends and characteristics for over three billion compute hours, 750 thousand jobs, and spanning a decade. We confirm several long-held conventional wisdom, and identify many previously undiscovered trends and its implications. We also introduce a learning based technique to predict the resource requirement of future jobs with high accuracy, using features available prior to the job submission and without requiring any application-specific tracing or application-intrusive instrumentation.

Details

Database :
OpenAIRE
Journal :
SC20: International Conference for High Performance Computing, Networking, Storage and Analysis
Accession number :
edsair.doi...........33292225e4ef403e6e33b49fcdf45c53
Full Text :
https://doi.org/10.1109/sc41405.2020.00088