Back to Search Start Over

The Reproducible Data Reuse (ReDaR) Framework to Capture and Assess Multiple Data Streams.

Authors :
Keefer, Donald A.
Blake, Catherine L.
Source :
Proceedings of the Association for Information Science & Technology; Oct2021, Vol. 58 Issue 1, p230-240, 11p
Publication Year :
2021

Abstract

Much of the literature in knowledge discovery from data (KDD) focuses on algorithms that are faster and more accurate at capturing patterns in a given data set. However, answering a research question is fundamentally connected with how well the data is aligned with the questions being asked. Thus, data selection is one of the most important steps to ensure that models produced from the KDD process are useful in practice. A lack of documentation about the data selection rationale and the transformations needed to semantically align the data streams prevents others from reproducing the research and obfuscates development of best practices in data integration. Our goal in this paper is to provide KDD practitioners with a framework that brings together theories in provenance, information quality, and contextual reasoning, to enable researchers to achieve a semantically aligned dataset with data selection, description, and documentation based on an applicationā€focused assessment. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
23739231
Volume :
58
Issue :
1
Database :
Complementary Index
Journal :
Proceedings of the Association for Information Science & Technology
Publication Type :
Conference
Accession number :
153009825
Full Text :
https://doi.org/10.1002/pra2.451