Back to Search Start Over

FAIR data pipeline: provenance-driven data management for traceable scientific workflows

Authors :
Sonia Natalie Mitchell
Andrew Lahiff
Nathan Cummings
Jonathan Hollocombe
Bram Boskamp
Ryan Field
Dennis Reddyhoff
Kristian Zarebski
Antony Wilson
Bruno Viola
Martin Burke
Blair Archibald
Paul Bessell
Richard Blackwell
Lisa A. Boden
Alys Brett
Sam Brett
Ruth Dundas
Jessica Enright
Alejandra N. Gonzalez-Beltran
Claire Harris
Ian Hinder
Christopher David Hughes
Martin Knight
Vino Mano
Ciaran McMonagle
Dominic Mellor
Sibylle Mohr
Glenn Marion
Louise Matthews
Iain J. McKendrick
Christopher Mark Pooley
Thibaud Porphyre
Aaron Reeves
Edward Townsend
Robert Turner
Jeremy Walton
Richard Reeve
Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE)
Université Claude Bernard Lyon 1 (UCBL)
Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)
ANR-16-IDEX-0005,IDEXLYON,IDEXLYON(2016)
Source :
Mitchell, S N, Lahiff, A, Cummings, N, Hollocombe, J, Boskamp, B, Field, R, Reddyhoff, D, Zarebski, K, Wilson, A, Viola, B, Burke, M, Archibald, B, Bessell, P, Blackwell, R, Boden, L A, Brett, A, Brett, S, Dundas, R, Enright, J, Gonzalez-Beltran, A N, Harris, C, Hinder, I, David Hughes, C, Knight, M, Mano, V, McMonagle, C, Mellor, D, Mohr, S, Marion, G, Matthews, L, McKendrick, I J, Mark Pooley, C, Porphyre, T, Reeves, A, Townsend, E, Turner, R, Walton, J & Reeve, R 2022, ' FAIR data pipeline : provenance-driven data management for traceable scientific workflows ', Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 380, no. 2233, 20210300 . https://doi.org/10.1098/rsta.2021.0300, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2022, 380 (2233), ⟨10.1098/rsta.2021.0300⟩
Publication Year :
2022
Publisher :
The Royal Society, 2022.

Abstract

Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of ‘following the science’ are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’.

Details

ISSN :
14712962 and 1364503X
Volume :
380
Database :
OpenAIRE
Journal :
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Accession number :
edsair.doi.dedup.....16a702da9a11b8817950011b5202a3cd
Full Text :
https://doi.org/10.1098/rsta.2021.0300