Back to Search Start Over

Data pipelines: the long road of data from its creation to its usage in a digital company environment

Authors :
Losada Cavestany, Beatriz
Montllor Ramoneda, Marcel
Grima Cintas, Pedro
Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa
Innovamat
Source :
UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
Publication Year :
2022
Publisher :
Universitat Politècnica de Catalunya, 2022.

Abstract

This project collects a process of decision making at Innovamat, an EdTech company from Barcelona. The company leverages data to improve its product and ensure customer success. The amount of data collected by Innovamat is constantly increasing. Therefore, the processes (data workflows) moving data in and out from the Data warehouse are increasing in number and getting more complex. Innovamat is facing a problem at controlling, orchestrating, and optimizing the data workflows, resulting in a lot of time and effort lost in activities that don’t add value to the company. The core of this project follows a technical process of decision making to find a specific solution for the case of Innovamat. The goal is to choose the best workflow orchestrator tool for the company, from a broad market landscape. Extended research is done to compare different tools based on criteria previously defined in line with the current needs. Afterwards, Apache Airflow (the chosen tool) is adopted locally, to implement one of the current workflows and test whether the tool can solve the current situation. At the end of the project, Apache Airflow is accepted to be a potential solution for Innovamat and opens the way to its implementation

Details

Language :
English
Database :
OpenAIRE
Journal :
UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
Accession number :
edsair.dedup.wf.001..eeab1ff22ac490ce313a5cd75b652005