1. Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach
- Author
-
Ganggui Guo, Shanshan Li, Yakun Liu, Ze Cao, and Yangyu Deng
- Subjects
Health, Toxicology and Mutagenesis ,Public Health, Environmental and Occupational Health ,cavity length ,optimization algorithm ,interpretable model ,ensemble learning model - Abstract
The cavity length, which is a vital index in aeration and corrosion reduction engineering, is affected by many factors and is challenging to calculate. In this study, 10-fold cross-validation was performed to select the optimal input configuration. Additionally, the hyperparameters of three ensemble learning models—random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting tree (XGBOOST)—were fine-tuned by the Bayesian optimization (BO) algorithm to improve the prediction accuracy and compare the five empirical methods. The XGBOOST method was observed to present the highest prediction accuracy. Further interpretability analysis carried out using the Sobol method demonstrated its ability to reasonably capture the varying relative significance of different input features under different flow conditions. The Sobol sensitivity analysis also observed two patterns of extracting information from the input features in ML models: (1) the main effect of individual features in ensemble learning and (2) the interactive effect between each feature in SVR. From the results, the models obtaining individual information both predict the cavity length more accurately than that using interactive information. Subsequently, the XGBOOST captures more correct information from features, which leads to the varied Sobol index in accordance with outside phenomena; meanwhile, the predicted results fit the experimental points best.
- Published
- 2022