Back to Search
Start Over
Online root-cause performance analysis of parallel applications
- Source :
- Parallel Computing. 48:81-107
- Publication Year :
- 2015
- Publisher :
- Elsevier BV, 2015.
-
Abstract
- We present a technique for automated application performance modeling and analysis.A parallel application is modeled by communication and computational activities.The analysis discovers causal dependencies between problems and infers root causes.It compares concurrent execution flows correlating problems in causal relationship.Results of root-cause analysis for different MPI applications are encouraging. The evolution of hardware is improving at an incredible rate. However, the advances in parallel software have been hampered for many reasons. Developing an efficient parallel application is still not an easy task. Applications rarely achieve good performances immediately and, therefore, careful performance analysis and optimization are crucial. These tasks are difficult and require a thorough understanding of the programs behavior. In this paper, we propose a systematic approach to online root-cause performance analysis. The automated analysis uses an online model to quickly identify the most important performance problems, and correlates them with application source code. Our technique is able to discover causal dependencies among the problems, infer their root causes and explain them to developers. In all of the scenarios we performed, this online modelling and analysis approach allowed us to understand the behavior of the applications, evaluate the performance and locate problem causes without specific knowledge of application internals.
- Subjects :
- Online model
Root (linguistics)
Theoretical computer science
Source code
Computer Networks and Communications
Computer science
business.industry
media_common.quotation_subject
Root cause
Machine learning
computer.software_genre
Computer Graphics and Computer-Aided Design
Theoretical Computer Science
Task (project management)
Artificial Intelligence
Hardware and Architecture
Artificial intelligence
business
Root cause analysis
computer
Software
media_common
Subjects
Details
- ISSN :
- 01678191
- Volume :
- 48
- Database :
- OpenAIRE
- Journal :
- Parallel Computing
- Accession number :
- edsair.doi...........8f30b14c811fca662aff653a4430b596
- Full Text :
- https://doi.org/10.1016/j.parco.2015.05.003