Back to Search Start Over

Simple heuristics for scheduling apache airflow: A case study at PT. X.

Authors :
Tanto, Hans
Bisono, Indriati N.
Soewandi, Hanijanto
Source :
AIP Conference Proceedings. 2022, Vol. 2383/2470 Issue 1, p1-10. 10p.
Publication Year :
2022

Abstract

In the big data era, there will be a lot of data that needs to be handled properly for making well-informed business decisions. As a result, the data needs to get updated regularly for relevant analysis. Apache Airflow is commonly utilized to schedule the data update by running the query through sequences of DAG tasks. Unfortunately, scheduling DAG tasks is categorized as NP-Hard problem which is difficult to obtain the global optimum solution. This paper shows a simple heuristic algorithm to solve the subset of DAG tasks at PT.XYZ using 2 computers (virtual machines). This is a special case of Pm|prec|Cmax problem where there are 2 computers are being used. Before constructing the algorithm, CPM (Critical Path Method) is used as the baseline to get the most optimal solution. A Knapsack problem is then used to add additional tasks for virtual machine #1, and a heuristic for virtual machine #2 as a compromise to balance the cost and time of completion. The characteristics of this simple algorithm is discussed with numerical examples/experiments. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0094243X
Volume :
2383/2470
Issue :
1
Database :
Academic Search Index
Journal :
AIP Conference Proceedings
Publication Type :
Conference
Accession number :
156528521
Full Text :
https://doi.org/10.1063/5.0081042