Back to Search Start Over

Data Ingestions as a Service (DIaaS): A Unified Interface for Heterogeneous Data Ingestion, Transformation, and Metadata Management for Data Lake

Authors :
H. V. Sreepathy
B. Dinesh Rao
Mohan Kumar Jaysubramanian
B. Deepak Rao
Source :
IEEE Access, Vol 12, Pp 156131-156145 (2024)
Publication Year :
2024
Publisher :
IEEE, 2024.

Abstract

Data ingestion tools are critical component of Data Lake. Existing data ingestion tools face challenges of handling large variety, formats, sources of data. There exists void for unified data ingestion interface to handle the above research problems. This study proposes an innovative and integrated framework for data ingestion in a data lake, addressing the challenges posed by heterogeneous data sources, formats, and metadata management. The framework comprises three novel modules: First Unified Data Integration Connectors (UDIC), which provide seamless connectivity and data retrieval capabilities from diverse sources including databases, data warehouses, file systems, cloud storage, and APIs; Second, Adaptive Data Variety Transformation (ADVT), a module that intelligently handles the transformation and processing of structured, semi-structured, and unstructured data types, ensuring efficient ingestion into the data lake; and third, Intelligent Metadata Management (IMM), a module that captures, stores, and manages metadata associated with the ingested data, offering advanced search, discovery, and enrichment functionalities. Comparative study corroborates features offered by the service with existing data ingestion tools to evaluate the novelty and significance of the study. Performance validation shows varying ingestion latencies across different data types: approximately 148.1 microseconds per record for structured data, 234.2 microseconds per record for semi-structured data, 65.6 microseconds per kilobyte (KB) for video data, and 42.7 microseconds per KB for image data. These results underscore the importance of considering data structure and size in optimizing ingestion processes. Overall, this research aims to revolutionize data ingestion in data lake environments by providing a unified solution for handling diverse data sources, formats, and metadata management.

Details

Language :
English
ISSN :
21693536
Volume :
12
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.8334d3d5cd54f6b8d3639df6beaafc7
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2024.3479736