Back to Search Start Over

Targeting a light-weight and multi-channel approach for distributed stream processing.

Authors :
Venugopal, Vinu Ellampallil
Theobald, Martin
Tassetti, Damien
Chaychi, Samira
Tawakuli, Amal
Source :
Journal of Parallel & Distributed Computing. Sep2022, Vol. 167, p77-96. 20p.
Publication Year :
2022

Abstract

Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous progress in distributed stream processing systems in the past few years, the high-throughput and low-latency (a.k.a. high sustainable-throughput) requirement of modern applications is pushing the limits of traditional data processing infrastructures. This paper introduces a new distributed stream processing engine (DSPE), called Asynchronous Iterative Routing (or simply "AIR"), which implements a light-weight, dynamic sharding protocol. AIR expedites direct and asynchronous communication among all the worker nodes via a channel-like communication protocol on top of the Message Passing Interface (MPI), thereby completely avoiding the need for a dedicated driver node. The system adopts a new progress-tracking protocol, called hew-meld , which has been experimentally observed to show a low processing latency on our asynchronous master-less architecture when compared to the conventional low-watermark technique. The current version of AIR is also equipped with two fault tolerance and recovery strategies namely checkpointing & rollback and replication. With its unique design, AIR scales out particularly well to multi-core HPC architectures; specifically, we deployed it on clusters with up to 16 nodes and 448 cores (thus reaching a peak of 435.3 million events and 55.14 GB of data processed per second), which we found to significantly outperform existing DSPEs. • A new open source distributed stream processing engine (DSPE), called Asynchronous Iterative Routing (or simply "AIR"), which implements a light-weight, dynamic sharding protocol. • With its unique design, AIR scales out particularly well to multi-core HPC architectures. • Specifically, we deployed it on clusters with up to 16 nodes and 448 cores (thus reaching a peak of 435.3 million events and 55.14 GB of data processed per second), which we found to significantly outperform existing DSPEs. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
07437315
Volume :
167
Database :
Academic Search Index
Journal :
Journal of Parallel & Distributed Computing
Publication Type :
Academic Journal
Accession number :
157330113
Full Text :
https://doi.org/10.1016/j.jpdc.2022.04.022