1. Autonomic Workload Change Classification and Prediction for Big Data Workloads
- Author
-
Frank Dehne and Mikhail Genkin
- Subjects
0209 industrial biotechnology ,business.industry ,Computer science ,Mission critical ,Big data ,Workload ,02 engineering and technology ,Machine learning ,computer.software_genre ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,Resource allocation ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
The big data software stack based on Apache Spark and Hadoop has become mission critical in many enterprises. Performance of Spark and Hadoop jobs depends on a large number of configuration settings. The manual tuning procedure is expensive and brittle. There have been efforts to develop online and off-line automatic tuning approaches to make the big data stack more autonomic, but many researchers noted that it is important to tune only when truly necessary because many parameter searches can reduce rather than enhance performance. Autonomic systems need to be able to accurately detect important changes in workload characteristics, predict future workload characteristics, and use this information to pro-actively optimise resource allocation and frequency of parameter searches. This paper presents the first study focusing on workload change detection, change classification and workload forecasting in big data workloads. We demonstrate 99% accuracy for workload change detection, 90% accuracy for workload and workload transition classification, and up to 96% accuracy for future workload type prediction on Spark and Hadoop job flows simulated using popular big data benchmarks. Our method does not rely on past workload history for workload type prediction.
- Published
- 2019
- Full Text
- View/download PDF