Back to Search
Start Over
Orchestrating scheduling, grouping and parallelism to enhance the performance of distributed stream computing system.
- Source :
-
Expert Systems with Applications . Nov2024, Vol. 254, pN.PAG-N.PAG. 1p. - Publication Year :
- 2024
-
Abstract
- In a big data stream computing environment, the arrival rate of data streams usually fluctuates over time, posing a great challenge to the elasticity of system. The performance of stream computing system is crucial, especially when dealing with unbounded and fluctuating data streams. Most prior studies have primarily focused on one or two aspects to enable elasticity, often lacking prompt and comprehensive performance optimization. This limitation could lead to a tuning bottleneck, preventing the system's performance from consistently reaching its optimal state. Additionally, many stream computing systems are not intelligently adaptive in real time due to the challenges of manual parameter reconfiguration for fluctuating streams. To better address these issues, we propose a framework named Sgp-Stream, which orchestrates scheduling, grouping and parallelism (Sgp). To enhance the system performance. We conduct the following research: (1) Running experiments to evaluate the impact of different factors such as scheduling, grouping and parallelism on system performance. Results show that factors at a single level usually have an upper limit on tuning system performance, and better overall performance can be achieved by coordinating multi-level factors. (2) Establishing quantitative models for stream application that consider computational cost and communication cost, multi-dimensional featured data stream, data center resources, and latency & throughput performance. (3) Demonstrating the effectiveness of the proposed runtime-aware data stream grouping based on smooth weighted polling, elastic adaptive scheduling based on Linear Deterministic Greedy and elastic scaling strategy based on Gradient Descent in Sgp-Stream, for continuous performance optimization.(4) Evaluating the application latency, throughput and resource utilization objectives using a real-world elastic stream computing system and twitter data set. Experimental results show that, compared to existing state-of-the-art works, the proposed Sgp-Stream outperforms them by reducing latency by 26%–48%, improving throughput by 14%–20%, and increasing resource utilization rate by 15%–21%, especially under increasing data stream input rates. • Experiments show single-level factor are insufficient for optimal system performance. • Establishment the DAG, data stream and resource model from quantitative perspective. • Grouping, scheduling and elastic scaling are coordinated optimization strategies. • Implementation of the prototype Sgp-Stream and its performance evaluation. [ABSTRACT FROM AUTHOR]
- Subjects :
- *COMPUTER systems
*DISTRIBUTED computing
*BIG data
*ELASTICITY
*SCHEDULING
Subjects
Details
- Language :
- English
- ISSN :
- 09574174
- Volume :
- 254
- Database :
- Academic Search Index
- Journal :
- Expert Systems with Applications
- Publication Type :
- Academic Journal
- Accession number :
- 178885677
- Full Text :
- https://doi.org/10.1016/j.eswa.2024.124346