Start Over

Targeting a light-weight and multi-channel approach for distributed stream processing.

Authors :: Venugopal, Vinu Ellampallil
Theobald, Martin
Tassetti, Damien
Chaychi, Samira
Tawakuli, Amal
Source :: Journal of Parallel & Distributed Computing. Sep2022, Vol. 167, p77-96. 20p.
Publication Year :: 2022
Abstract: Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous progress in distributed stream processing systems in the past few years, the high-throughput and low-latency (a.k.a. high sustainable-throughput) requirement of modern applications is pushing the limits of traditional data processing infrastructures. This paper introduces a new distributed stream processing engine (DSPE), called Asynchronous Iterative Routing (or simply "AIR"), which implements a light-weight, dynamic sharding protocol. AIR expedites direct and asynchronous communication among all the worker nodes via a channel-like communication protocol on top of the Message Passing Interface (MPI), thereby completely avoiding the need for a dedicated driver node. The system adopts a new progress-tracking protocol, called hew-meld , which has been experimentally observed to show a low processing latency on our asynchronous master-less architecture when compared to the conventional low-watermark technique. The current version of AIR is also equipped with two fault tolerance and recovery strategies namely checkpointing & rollback and replication. With its unique design, AIR scales out particularly well to multi-core HPC architectures; specifically, we deployed it on clusters with up to 16 nodes and 448 cores (thus reaching a peak of 435.3 million events and 55.14 GB of data processed per second), which we found to significantly outperform existing DSPEs. • A new open source distributed stream processing engine (DSPE), called Asynchronous Iterative Routing (or simply "AIR"), which implements a light-weight, dynamic sharding protocol. • With its unique design, AIR scales out particularly well to multi-core HPC architectures. • Specifically, we deployed it on clusters with up to 16 nodes and 448 cores (thus reaching a peak of 435.3 million events and 55.14 GB of data processed per second), which we found to significantly outperform existing DSPEs. [ABSTRACT FROM AUTHOR]

Subjects :: *DISTRIBUTED computing
*MESSAGE passing (Computer science)
*FAULT tolerance (Engineering)
*BIG data
*ELECTRONIC data processing
*WATERMARKS

Details

Language :: English
ISSN :: 07437315
Volume :: 167
Database :: Academic Search Index
Journal :: Journal of Parallel & Distributed Computing
Publication Type :: Academic Journal
Accession number :: 157330113
Full Text :: https://doi.org/10.1016/j.jpdc.2022.04.022

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Targeting a light-weight and multi-channel approach for distributed stream processing.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Targeting a light-weight and multi-channel approach for distributed stream processing.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources