Back to Search
Start Over
High-level ETL for semantic data warehouses.
- Source :
- Semantic Web (1570-0844); 2022, Vol. 13 Issue 1, p85-132, 48p
- Publication Year :
- 2022
-
Abstract
- The popularity of the Semantic Web (SW) encourages organizations to organize and publish semantic data using the RDF model. This growth poses new requirements to Business Intelligence technologies to enable On-Line Analytical Processing (OLAP)-like analysis over semantic data. The incorporation of semantic data into a Data Warehouse (DW) is not supported by the traditional Extract-Transform-Load (ETL) tools because they do not consider semantic issues in the integration process. In this paper, we propose a layer-based integration process and a set of high-level RDF-based ETL constructs required to define, map, extract, process, transform, integrate, update, and load (multidimensional) semantic data. Different to other ETL tools, we automate the ETL data flows by creating metadata at the schema level. Therefore, it relieves ETL developers from the burden of manual mapping at the ETL operation level. We create a prototype, named Semantic ETL Construct (SETL<subscript>CONSTRUCT</subscript>), based on the innovative ETL constructs proposed here. To evaluate SETL<subscript>CONSTRUCT</subscript>, we create a multidimensional semantic DW by integrating a Danish Business dataset and an EU Subsidy dataset using it and compare it with the previous programmable framework SETL<subscript>PROG</subscript> in terms of productivity, development time, and performance. The evaluation shows that 1) SETL<subscript>CONSTRUCT</subscript> uses 92% fewer Number of Typed Characters (NOTC) than SETL<subscript>PROG</subscript>, and SETL<subscript>AUTO</subscript> (the extension of SETL<subscript>CONSTRUCT</subscript> for generating ETL execution flows automatically) further reduces the Number of Used Concepts (NOUC) by another 25%; 2) using SETL<subscript>CONSTRUCT</subscript>, the development time is almost cut in half compared to SETL<subscript>PROG</subscript>, and is cut by another 27% using SETL<subscript>AUTO</subscript>; and 3) SETL<subscript>CONSTRUCT</subscript> is scalable and has similar performance compared to SETL<subscript>PROG</subscript>. We also evaluate our approach qualitatively by interviewing two ETL experts. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 15700844
- Volume :
- 13
- Issue :
- 1
- Database :
- Complementary Index
- Journal :
- Semantic Web (1570-0844)
- Publication Type :
- Academic Journal
- Accession number :
- 153965096
- Full Text :
- https://doi.org/10.3233/SW-210429