Back to Search Start Over

Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance

Authors :
Qian Zhang
Yang Cao
Qiwen Wang
Duc Vu
Priyaa Thavasimani
Timothy McPhillips
Paolo Missier
Peter Slaughter
Christopher Jones
Matthew B. Jones
Bertram Ludäscher
Source :
International Journal of Digital Curation, Vol 12, Iss 2 (2018)
Publication Year :
2018
Publisher :
University of Edinburgh, 2018.

Abstract

We illustrate how combining retrospective and prospectiveprovenance can yield scientifically meaningful hybrid provenancerepresentations of the computational histories of data produced during a script run. We use scripts from multiple disciplines (astrophysics, climate science, biodiversity data curation, and social network analysis), implemented in Python, R, and MATLAB, to highlight the usefulness of diverse forms of retrospectiveprovenance when coupled with prospectiveprovenance. Users provide prospective provenance, i.e., the conceptual workflows latent in scripts, via simple YesWorkflow annotations, embedded as script comments. Runtime observables can be linked to prospective provenance via relational views and queries. These observables could be found hidden in filenames or folder structures, be recorded in log files, or they can be automatically captured using tools such as noWorkflow or the DataONE RunManagers. The YesWorkflow toolkit, example scripts, and demonstration code are available via an open source repository.

Details

Language :
English
ISSN :
17468256
Volume :
12
Issue :
2
Database :
Directory of Open Access Journals
Journal :
International Journal of Digital Curation
Publication Type :
Academic Journal
Accession number :
edsdoj.89261fc52aa747f3a1ac87c9eeeab21f
Document Type :
article
Full Text :
https://doi.org/10.2218/ijdc.v12i2.585