1. Data pipelines: the long road of data from its creation to its usage in a digital company environment
- Author
-
Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa, Innovamat, Montllor Ramoneda, Marcel, Grima Cintas, Pedro, Losada Cavestany, Beatriz, Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa, Innovamat, Montllor Ramoneda, Marcel, Grima Cintas, Pedro, and Losada Cavestany, Beatriz
- Abstract
This project collects a process of decision making at Innovamat, an EdTech company from Barcelona. The company leverages data to improve its product and ensure customer success. The amount of data collected by Innovamat is constantly increasing. Therefore, the processes (data workflows) moving data in and out from the Data warehouse are increasing in number and getting more complex. Innovamat is facing a problem at controlling, orchestrating, and optimizing the data workflows, resulting in a lot of time and effort lost in activities that don’t add value to the company. The core of this project follows a technical process of decision making to find a specific solution for the case of Innovamat. The goal is to choose the best workflow orchestrator tool for the company, from a broad market landscape. Extended research is done to compare different tools based on criteria previously defined in line with the current needs. Afterwards, Apache Airflow (the chosen tool) is adopted locally, to implement one of the current workflows and test whether the tool can solve the current situation. At the end of the project, Apache Airflow is accepted to be a potential solution for Innovamat and opens the way to its implementation
- Published
- 2022