Back to Search
Start Over
Job Characteristics on Large-Scale Systems: Long-Term Analysis, Quantification, and Implications
- Source :
- SC
- Publication Year :
- 2020
- Publisher :
- IEEE, 2020.
-
Abstract
- HPC workload analysis and resource consumption characteristics are the key to driving better operation practices, system procurement decisions, and designing effective resource management techniques. Unfortunately, the HPC community does not have easy accessibility to long-term introspective work-load analysis and characterization for production-scale HPC systems. This study bridges this gap by providing detailed long-term quantification, characterization, and analysis of job characteristics on two supercomputers: Intrepid and Mira. This study is one of the largest of its kind – covering trends and characteristics for over three billion compute hours, 750 thousand jobs, and spanning a decade. We confirm several long-held conventional wisdom, and identify many previously undiscovered trends and its implications. We also introduce a learning based technique to predict the resource requirement of future jobs with high accuracy, using features available prior to the job submission and without requiring any application-specific tracing or application-intrusive instrumentation.
- Subjects :
- 020203 distributed computing
Resource (project management)
Procurement
Computer science
0202 electrical engineering, electronic engineering, information engineering
Resource allocation
020201 artificial intelligence & image processing
Resource management
Workload
02 engineering and technology
Instrumentation (computer programming)
Data science
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- SC20: International Conference for High Performance Computing, Networking, Storage and Analysis
- Accession number :
- edsair.doi...........33292225e4ef403e6e33b49fcdf45c53
- Full Text :
- https://doi.org/10.1109/sc41405.2020.00088