Back to Search Start Over

Application kernels: HPC resources performance monitoring and variance analysis

Authors :
Robert L. DeLeon
Thomas R. Furlani
Amin Ghadersohi
Nikolay A. Simakov
Matthew D. Jones
Steven M. Gallo
Joseph P. White
Abani Patra
Source :
Concurrency and Computation: Practice and Experience. 27:5238-5260
Publication Year :
2015
Publisher :
Wiley, 2015.

Abstract

Application kernels are computationally lightweight benchmarks or applications run repeatedly on high performance computing HPC clusters in order to track the Quality of Service QoS provided to the users. They have been successful in detecting a variety of hardware and software issues, some severe, that have subsequently been corrected, resulting in improved system performance and throughput. In this work, the application kernels performance monitoring module of eXtreme Data Metrics on Demand XDMoD is described. Through the XDMoD framework, the application kernels have been run repetitively on the Texas Advanced Computing Center's Stampede and Lonestar4 clusters for a total of over 14,000 jobs. This provides a body of data on the HPC clusters operation that can be used to statistically analyze how the application performance, as measured by metrics such as execution time and communication bandwidth, is affected by the cluster's workload. We discuss metric distributions, carry out regression and correlation analyses, and use a PCA study to describe the variance and relate the variance to factors such as the spatial distribution of the application in the cluster. Ultimately, these types of analyses can be used to improve the application kernel mechanism, which in turn results in improved QoS of the HPC infrastructure that is delivered to the end users. Copyright © 2015 John Wiley & Sons, Ltd.

Details

ISSN :
15320626
Volume :
27
Database :
OpenAIRE
Journal :
Concurrency and Computation: Practice and Experience
Accession number :
edsair.doi...........24462fa30ea2c8af2753913466f80e0e