Back to Search Start Over

CWLProv - Interoperable Retrospective Provenance capture and its challenges

Authors :
Farah Zaib Khan
Stian Soiland-Reyes
Richard O. Sinnott
Andrew Lonie
Michael R. Crusoe
Publication Year :
2018
Publisher :
Zenodo, 2018.

Abstract

Presented at Bioinformatics Open Source Conference (BOSC) 2018 Source Code snapshot:https://github.com/common-workflow-language/cwltool/tree/921fc1d387930a0a5fede332c43f039697f6a4de License: https://www.apache.org/licenses/LICENSE-2.0 Research Object: https://doi.org/10.5281/zenodo.1215611 Abstract(accepted for poster and talk at BOSC2018) The automation of data analysis in the form of scientific workflows is a widely adopted practice in many fields of research nowadays. Computationally driven data-intensive experiments using workflows enable Automation, Scaling, Adaption and Provenance support (ASAP). However, there are still a number of challenges associated with the effective sharing, publication, understandability and reproducibility of such workflows due to the incomplete capture of provenance and the dependence on the particular technical (software) platforms. We present CWLProv, an approach for retrospective provenance-capture utilizing open source community-driven standards involving application and customization of workflow-centric Research Objects (ROs). The ROs are produced as an output of a workflow enactment defined in the Common Workflow Language (CWL) using reference implementation cwltool. The approach aggregates and annotates all the resources involved in the scientific investigation including inputs, outputs, workflow specification, command line tool specifications and input parameter settings. The resources are linked within the RO to enable re-enactment of an analysis without depending on external resources. The workflow provenance profile is represented in W3C standardized PROV-N and PROV-JSON format and captures retrospective provenance of the workflow enactment. The workflow-centric RO produced as an output of a CWL workflow enactment is expected to be interoperable, reusable, shareable and portable across different platforms. Our work describes the need and motivation for CWLProv and the lessons learned in applying it for ROs using CWL in the bioinformatics domain. The complete capture of provenance along with the aggregated resources used in a workflow enactment will mitigate the workflow decay and allow applications of provenance to make experiments transparent, reproducible and authentic. We believe that underlying principles of the standards utilized to implement CWLProv will result in a semantically rich executable workflow objects such that any platform supporting CWL and CWLProv will be able to reproduce them. We ultimately aim to achieve a solution that is compliant with all four dimensions of FAIR principles. Currently CWLProv is implemented using the reference implementation, cwltool. This study can further be extended to support Provenance Capture on other platforms supporting CWL to demonstrate interoperability of analysis methods.

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....5c7b7727d9939a4a6a53a5f10366522b
Full Text :
https://doi.org/10.5281/zenodo.1215611