Back to Search Start Over

Automatic Versioning of Time Series Datasets: a FAIR Algorithmic Approach

Authors :
González-Cebrián, Alba
McGuinness, Luke
Rafii, Fadoua
Bradford, Michael
E. Chis, Adriana
González-Vélez, Horacio
Publication Year :
2022
Publisher :
Zenodo, 2022.

Abstract

As one of the fundamental concepts underpinning the FAIR (Findability, Accessibility, Interoperability, and Reusability) guiding principles, data provenance entails keeping track of each version for a given dataset from its original to its latest version. However, standard terms to determine and include versioning information in the metadata of a given dataset are still ambiguous and do not explicitly define how to assess the overlap of information between items along a versioning stream. In this work, we propose a novel approach for automatic versioning of time series datasets, based on the use of parameters from two dimensionality reduction approaches, namely Principal Component Analysis and Autoencoders. That is to say, we systematically detect and measure similarities (information distances)in datasets via dimensionality reduction, encode them as different versions, and then automatically generate provenance metadata via a FAIRversioning service using the W3C DCAT 3.0 nomenclature. We illustrate this approach with two time series datasets and demonstrate how the proposed parameters effectively assess the similarity between different data versions. Our results have shown that the proposed version similarity metrics are robust (\(s^{(0,1)} = 1\)) to the alteration of up to 60% of cells, the removal of up to 60% of rows, and the log-scale transformation of variables. In contrast, row-wise transformations (e.g. converting absolute values to a percentage of a second variable) yield minimal similarity values (\(s^{(0,1)} < 0.75\)). Our code and datasets are openly available to enable reproducibility.&nbsp

Subjects

Subjects :
paper-presentation

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....4b0e1aea017f76e4e408b4ed00058d98
Full Text :
https://doi.org/10.5281/zenodo.7158371