Back to Search Start Over

Metadata Extraction from User Queries for Self-Service Data Lake Exploration.

Authors :
Gunklach, Jonas
Michalczyk, Sven
Nadj, Mario
Maedche, Alexander
Source :
Datenbank-Spektrum; Jul2023, Vol. 23 Issue 2, p97-105, 9p
Publication Year :
2023

Abstract

Data catalogs represent a promising solution for semantically classifying and organizing data sources and enriching raw data with metadata. However, recent research has shown that data catalogs are difficult to implement due to the complexity of the data landscape or issues with data governance. Moreover, data catalogs struggle to enable business analysts to find the data they need for their use cases. Against this backdrop, we develop a self-service system that automatically extracts metadata from a data lake and enables business analysts to explore the metadata through an easy-to-use interface. Specifically, instead of implementing the data catalog top-down, our system derives metadata from user queries bottom-up. Hereby, we conduct 15 interviews with business analysts to derive the underlying requirements of the system and evaluate its features with a focus group. Our findings illustrate that participants especially value the possibility to reuse queries from other users and appreciated the support in query validation as data preparation is a complex and time-consuming endeavour. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
16182162
Volume :
23
Issue :
2
Database :
Complementary Index
Journal :
Datenbank-Spektrum
Publication Type :
Academic Journal
Accession number :
169702590
Full Text :
https://doi.org/10.1007/s13222-023-00448-z