1. TRAK: A Testing Tool for Studying the Reliability of Data Delivery in Apache Kafka
- Author
-
Han Wu, Zhihao Shang, and Katinka Wolter
- Subjects
Stream processing ,Data stream mining ,Computer science ,Packet loss ,Distributed computing ,Reliability (computer networking) ,Testbed ,Network delay ,Network emulation ,Electronic trading - Abstract
In modern applications the demand for real-time processing of high-volume data streams is growing. Common application scenarios include market feed processing and electronic trading, maintenance of IoT devices and fraud detection. In some scenarios reliability is the utmost concern while in others speed and simplicity are the top priority. Apache Kafka is a high-throughput distributed messaging system and its reliable stream delivery capability makes it an ideal source of data for stream-processing systems. With various configurable parameters Kafka is very flexible in reliable data delivery thus allowing all kinds of reliability tradeoffs. In this paper we introduce a tool for Testing the Reliability of Apache Kafka (TRAK), to study different data delivery semantics in Kafka and compare their reliability under poor network quality. We build a Kafka testbed using Docker containers and use a network emulation tool to control the network delay and loss. Two metrics, message loss rate and duplicate rate, are used in our experiments to evaluate the reliability of data delivery in Kafka. The experimental results show that under high network delay the size of messages matters. The at-least-once semantics is more reliable than at-most-once in a network with high packet loss, but can lead to duplicated messages.
- Published
- 2019
- Full Text
- View/download PDF