Back to Search
Start Over
Data provenance hybridization supporting extreme-scale scientific workflow applications
- Source :
- 2016 New York Scientific Data Summit (NYSDS).
- Publication Year :
- 2016
- Publisher :
- IEEE, 2016.
-
Abstract
- As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs. To that end, the US Department of Energy Integrated end-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD) project is currently investigating an integrated approach to prediction and diagnosis of these extreme-scale scientific workflows. To gain insight and a more quantitative understanding of a workflow's performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow's execution. In this paper, we describe IPPD's provenance management solution (ProvEn) and its hybrid data store combining both of these data provenance perspectives. We discuss design and implementation details that include provenance disclosure, scalability, data integration, and a discussion on query and analysis capabilities. We also present use case examples for climate modeling and thermal modeling application domains.
Details
- Database :
- OpenAIRE
- Journal :
- 2016 New York Scientific Data Summit (NYSDS)
- Accession number :
- edsair.doi...........1e5eb562eda2e2b7cd63695ec301570e
- Full Text :
- https://doi.org/10.1109/nysds.2016.7747819