1. Keddah: Network Evaluation Powered by Simulating Distributed Application Traffic.
- Author
-
Deng, Jie, Tyson, Gareth, Cuadrado, Felix, and Uhlig, Steve
- Subjects
JOB performance ,REPRODUCIBLE research ,INTERNET traffic ,RESOURCE allocation ,YARN - Abstract
As a distributed system, Hadoop heavily relies on the network to complete data-processing jobs. While the traffic generated by Hadoop jobs is critical for job execution performance, the actual behaviour of Hadoop network traffic is still poorly understood. This lack of understanding greatly complicates research relying on Hadoop workloads. In this article, we explore Hadoop traffic through empirical traces. We analyse the generated traffic of multiple types of MapReduce jobs, with varying input sizes, and cluster configuration parameters. We present Keddah, a toolchain for capturing, modelling, and reproducing Hadoop traffic, for use with network simulators to better capture the behaviour of Hadoop. By imitating the Hadoop traffic generation process and considering the YARN resource allocation, Keddah can be used to create Hadoop traffic workloads, enabling reproducible Hadoop research in more realistic scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF