Back to Search
Start Over
Simple heuristics for scheduling apache airflow: A case study at PT. X.
- Source :
-
AIP Conference Proceedings . 2022, Vol. 2383/2470 Issue 1, p1-10. 10p. - Publication Year :
- 2022
-
Abstract
- In the big data era, there will be a lot of data that needs to be handled properly for making well-informed business decisions. As a result, the data needs to get updated regularly for relevant analysis. Apache Airflow is commonly utilized to schedule the data update by running the query through sequences of DAG tasks. Unfortunately, scheduling DAG tasks is categorized as NP-Hard problem which is difficult to obtain the global optimum solution. This paper shows a simple heuristic algorithm to solve the subset of DAG tasks at PT.XYZ using 2 computers (virtual machines). This is a special case of Pm|prec|Cmax problem where there are 2 computers are being used. Before constructing the algorithm, CPM (Critical Path Method) is used as the baseline to get the most optimal solution. A Knapsack problem is then used to add additional tasks for virtual machine #1, and a heuristic for virtual machine #2 as a compromise to balance the cost and time of completion. The characteristics of this simple algorithm is discussed with numerical examples/experiments. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 0094243X
- Volume :
- 2383/2470
- Issue :
- 1
- Database :
- Academic Search Index
- Journal :
- AIP Conference Proceedings
- Publication Type :
- Conference
- Accession number :
- 156528521
- Full Text :
- https://doi.org/10.1063/5.0081042