Back to Search Start Over

Data provenance hybridization supporting extreme-scale scientific workflow applications

Authors :
Alok R. Singh
Ilkay Altintas
Darrern Kerbyson
Bibi Raju
Eric G. Stephan
Todd O. Elsethagen
Malachi Schram
Kerstin Kleese van Dam
Matt Macduff
Source :
2016 New York Scientific Data Summit (NYSDS).
Publication Year :
2016
Publisher :
IEEE, 2016.

Abstract

As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs. To that end, the US Department of Energy Integrated end-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD) project is currently investigating an integrated approach to prediction and diagnosis of these extreme-scale scientific workflows. To gain insight and a more quantitative understanding of a workflow's performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow's execution. In this paper, we describe IPPD's provenance management solution (ProvEn) and its hybrid data store combining both of these data provenance perspectives. We discuss design and implementation details that include provenance disclosure, scalability, data integration, and a discussion on query and analysis capabilities. We also present use case examples for climate modeling and thermal modeling application domains.

Details

Database :
OpenAIRE
Journal :
2016 New York Scientific Data Summit (NYSDS)
Accession number :
edsair.doi...........1e5eb562eda2e2b7cd63695ec301570e
Full Text :
https://doi.org/10.1109/nysds.2016.7747819