Back to Search Start Over

Application-aware Congestion Mitigation for High-Performance Computing Systems

Authors :
Patke, Archit
Jha, Saurabh
Qiu, Haoran
Brandt, Jim
Gentile, Ann
Greenseid, Joe
Kalbarczyk, Zbigniew
Iyer, Ravishankar
Publication Year :
2020

Abstract

High-performance computing (HPC) systems frequently experience congestion leading to significant application performance variation. However, the impact of congestion on application runtime differs from application to application depending on their network characteristics (such as bandwidth and latency requirements). We leverage this insight to develop Netscope, an automated ML-driven framework that considers those network characteristics to dynamically mitigate congestion. We evaluate Netscope on four Cray Aries systems, including a production supercomputer on real scientific applications. Netscope has a lower training cost and accurately estimates the impact of congestion on application runtime with a correlation between 0.7and 0.9 for common scientific applications. Moreover, we find that Netscope reduces tail runtime variability by up to 14.9 times while improving median system utility by 12%.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2012.07755
Document Type :
Working Paper