Back to Search Start Over

FaBRiQ: Leveraging Distributed Hash Tables towards Distributed Publish-Subscribe Message Queues

Authors :
Shiva Srivastava
Ioan Raicu
Iman Sadooghi
Dongfang Zhao
Tonglin Li
Dharmit Patel
Ke Wang
Source :
BDC
Publication Year :
2015
Publisher :
IEEE, 2015.

Abstract

The advent of Big Data has brought many challenges and opportunities in distributed systems, which have only amplified with the rate of growth of data. There is a need to rethink the software stack for supporting data intensive computing and big data analytics. Over the past decade, the data analytics applications have turned to finer granular tasks which are shorter in duration and much more in quantity. Such applications require new frameworks to handle their data flow. Distributed Message Queues have proven to be essential building blocks in distributed computing towards the support for fine granular workloads. Distributed message queues such as Amazon's SQS or Apache's Kafka have been used in handling massive data volumes, content delivery, and many more. They have also been used in large scale job scheduling on public clouds. However, even these frameworks have some limitations that make them incapable of handling large scale data with high efficiency. Those are not suitable for High Performance Computing (HPC) applications that require lower latency than what is available on the cloud. We propose Fabriq, a distributed message queue that runs on top of a Distributed Hash Table. The design goal of Fabriq is to achieve lower latency and higher efficiency while being able to handle large scales. Moreover, Fabriq is persistent, reliable and consistent. Also, unlike other state-of-the-art systems, Fabriq guarantees exactly once delivery of the messages. The results show that Fabriq was able to achieve high throughput in both small and large messages. At the scale of 128 nodes, Fabriq's throughput was as high as 1.8 Gigabytes/sec for 1 Megabytes messages, and more than 90,000 messages/sec for 50 bytes messages. At the same scale, Fabriq's latency was less than 1 millisecond. Our framework outperforms other state of the art systems including Kafka and SQS in throughput and latency. Furthermore, our experiments show that Fabriq provides a significantly better load balancing than Kafka. The load difference between Fabriq servers was less than 9.5% (compared to the even share), while in Kafka this difference was 100%, meaning that some servers did not receive any messages and remained idle.

Details

Database :
OpenAIRE
Journal :
2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC)
Accession number :
edsair.doi...........96a7f368895c632b7c597aceba66b468