1. Apollo: Rapidly Picking the Optimal Cloud Configurations for Big Data Analytics Using a Data-Driven Approach
- Author
-
Yuewen Wu, Wenbo Zhang, Yuanjia Xu, Hua Zhong, Heng Wu, and Lin-Gang Su
- Subjects
Computer science ,business.industry ,Distributed computing ,Big data ,Cloud computing ,Computer Science Applications ,Theoretical Computer Science ,Data-driven ,Computational Theory and Mathematics ,Hardware and Architecture ,Key (cryptography) ,Overhead (computing) ,Leverage (statistics) ,Local search (optimization) ,Pairwise comparison ,business ,Software - Abstract
Big data analytics applications are increasingly deployed on cloud computing infrastructures, and it is still a big challenge to pick the optimal cloud configurations in a cost-effective way. In this paper, we address this problem with a high accuracy and a low overhead. We propose Apollo, a data-driven approach that can rapidly pick the optimal cloud configurations by reusing data from similar workloads. We first classify 12 typical workloads in BigDataBench by characterizing pairwise correlations in our offline benchmarks. When a new workload comes, we run it with several small datasets to rank its key characteristics and get its similar workloads. Based on the rank, we then limit the search space of cloud configurations through a classification mechanism. At last, we leverage a hierarchical regression model to measure which cluster is more suitable and use a local search strategy to pick the optimal cloud configurations in a few extra tests. Our evaluation on 12 typical workloads in HiBench shows that compared with state-of-the-art approaches, Apollo can improve up to 30% search accuracy, while reducing as much as 50% overhead for picking the optimal cloud configurations.
- Published
- 2021