Back to Search
Start Over
A Zone Reference Model for Enterprise-Grade Data Lake Management
- Source :
- EDOC
- Publication Year :
- 2020
- Publisher :
- IEEE, 2020.
-
Abstract
- Data lakes are on the rise as data platforms for any kind of analytics, from data exploration to machine learning. They achieve the required flexibility by storing heterogeneous data in their raw format, and by avoiding the need for pre-defined use cases. However, storing only raw data is inefficient, as for many applications, the same data processing has to be applied repeatedly. To foster the reuse of processing steps, literature proposes to store data in different degrees of processing in addition to their raw format. To this end, data lakes are typically structured in zones. There exists various zone models, but they are varied, vague, and no assessments are given. It is unclear which of these zone models is applicable in a practical data lake implementation in enterprises. In this work, we assess existing zone models using requirements derived from multiple representative data analytics use cases of a real-world industry case. We identify the shortcomings of existing work and develop a zone reference model for enterprise-grade data lake management in a detailed manner. We assess the reference model’s applicability through a prototypical implementation for a real-world enterprise data lake use case. This assessment shows that the zone reference model meets the requirements relevant in practice and is ready for industry use.
- Subjects :
- 0209 industrial biotechnology
Data processing
Database
business.industry
Computer science
Big data
02 engineering and technology
computer.software_genre
Enterprise data management
Data modeling
020901 industrial engineering & automation
Analytics
0202 electrical engineering, electronic engineering, information engineering
Data analysis
020201 artificial intelligence & image processing
Raw data
business
computer
Reference model
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2020 IEEE 24th International Enterprise Distributed Object Computing Conference (EDOC)
- Accession number :
- edsair.doi...........b15fc02fc22781f0cb6cf5fa06418acf
- Full Text :
- https://doi.org/10.1109/edoc49727.2020.00017