1. A Dynamic Resource Controller for Resolving Quality of Service Issues in Modern Streaming Processing Engines
- Author
-
Hoseinyfarahabady, M. R., Taheri, Javid, Zomaya, A. Y., Tari, Z., Hoseinyfarahabady, M. R., Taheri, Javid, Zomaya, A. Y., and Tari, Z.
- Abstract
Devising an elastic resource allocation controller of data analytical applications in virtualized data-center has received a great attention recently, mainly due to the fact that even a slight performance improvement can translate to huge monetary savings in practical large-scale execution. Apache Flink is among modern streamed data processing run-times that can provide both low latency and high throughput computation in to execute processing pipelines over high-volume and high-velocity data-items under tight latency constraints. However, a yet to be answered challenge in a large-scale platform with tens of worker nodes is how to resolve the run-time violation in the quality of service (QoS) level in a multi-tenant data streaming platforms, particularly when the amount of workload generated by different users fluctuates. Studies showed that a static resource allocation algorithm (round-robin), which is used by default in Apache Flink, suffer from lack of responsiveness to sudden traffic surges happening unpredictably during the run-time. In this paper, we address the problem of resource management in a Flink platform for ensuring different QoS enforcement levels in a platform with shared computing resources. The proposed solution applies theoretical principals borrowed from close-loop control theory to design a CPU and memory adjustment mechanism with the primary goal to fulfill the different QoS levels requested by submitted applications while the resource interference is considered as the critical performance-limiting factor. The performance evaluation is carried out by comparing the proposed resource allocation mechanism with two static heuristics (round robin and class-based weighted fair queuing) in a 80-core cluster under multiple traffic patterns resembling sudden changes in the incoming workloads of low-priory streaming applications. The experimental results confirm the stability of the proposed controller to regulate the underlying platform resources to smo
- Published
- 2020
- Full Text
- View/download PDF