Back to Search
Start Over
Application kernels: HPC resources performance monitoring and variance analysis
- Source :
- Concurrency and Computation: Practice and Experience. 27:5238-5260
- Publication Year :
- 2015
- Publisher :
- Wiley, 2015.
-
Abstract
- Application kernels are computationally lightweight benchmarks or applications run repeatedly on high performance computing HPC clusters in order to track the Quality of Service QoS provided to the users. They have been successful in detecting a variety of hardware and software issues, some severe, that have subsequently been corrected, resulting in improved system performance and throughput. In this work, the application kernels performance monitoring module of eXtreme Data Metrics on Demand XDMoD is described. Through the XDMoD framework, the application kernels have been run repetitively on the Texas Advanced Computing Center's Stampede and Lonestar4 clusters for a total of over 14,000 jobs. This provides a body of data on the HPC clusters operation that can be used to statistically analyze how the application performance, as measured by metrics such as execution time and communication bandwidth, is affected by the cluster's workload. We discuss metric distributions, carry out regression and correlation analyses, and use a PCA study to describe the variance and relate the variance to factors such as the spatial distribution of the application in the cluster. Ultimately, these types of analyses can be used to improve the application kernel mechanism, which in turn results in improved QoS of the HPC infrastructure that is delivered to the end users. Copyright © 2015 John Wiley & Sons, Ltd.
- Subjects :
- Computer Networks and Communications
business.industry
Computer science
Quality of service
Distributed computing
Workload
Variance (accounting)
Supercomputer
Computer Science Applications
Theoretical Computer Science
Software
Computational Theory and Mathematics
Kernel (statistics)
Metric (mathematics)
business
Throughput (business)
Subjects
Details
- ISSN :
- 15320626
- Volume :
- 27
- Database :
- OpenAIRE
- Journal :
- Concurrency and Computation: Practice and Experience
- Accession number :
- edsair.doi...........24462fa30ea2c8af2753913466f80e0e