Back to Search Start Over

Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNEEC methods.

Authors :
Rahmati, Omid
Choubin, Bahram
Fathabadi, Abolhasan
Coulon, Frederic
Soltani, Elinaz
Shahabi, Himan
Mollaefar, Eisa
Tiefenbacher, John
Cipullo, Sabrina
Ahmad, Baharin Bin
Tien Bui, Dieu
Source :
Science of the Total Environment. Oct2019, Vol. 688, p855-866. 12p.
Publication Year :
2019

Abstract

Although estimating the uncertainty of models used for modelling nitrate contamination of groundwater is essential in groundwater management, it has been generally ignored. This issue motivates this research to explore the predictive uncertainty of machine-learning (ML) models in this field of study using two different residuals uncertainty methods: quantile regression (QR) and uncertainty estimation based on local errors and clustering (UNEEC). Prediction-interval coverage probability (PICP), the most important of the statistical measures of uncertainty, was used to evaluate uncertainty. Additionally, three state-of-the-art ML models including support vector machine (SVM), random forest (RF), and k -nearest neighbor (k NN) were selected to spatially model groundwater nitrate concentrations. The models were calibrated with nitrate concentrations from 80 wells (70% of the data) and then validated with nitrate concentrations from 34 wells (30% of the data). Both uncertainty and predictive performance criteria should be considered when comparing and selecting the best model. Results highlight that the k NN model is the best model because not only did it have the lowest uncertainty based on the PICP statistic in both the QR (0.94) and the UNEEC (in all clusters, 0.85–0.91) methods, but it also had predictive performance statistics (RMSE = 10.63, R2 = 0.71) that were relatively similar to RF (RMSE = 10.41, R2 = 0.72) and higher than SVM (RMSE = 13.28, R2 = 0.58). Determining the uncertainty of ML models used for spatially modelling groundwater-nitrate pollution enables managers to achieve better risk-based decision making and consequently increases the reliability and credibility of groundwater-nitrate predictions. Unlabelled Image • Predictive uncertainty of models was estimated using the QR and UNEEC methods. • Random Forest model had the lower uncertainty band width based on the both methods. • Groundwater nitrate (NO 3) concentrations were predicted using RF, SVM, and K NN. • Random Forest model outperformed other models in terms of predictive performance. • Hydraulic conductivity and elevation had the highest contribution to the modelling. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00489697
Volume :
688
Database :
Academic Search Index
Journal :
Science of the Total Environment
Publication Type :
Academic Journal
Accession number :
138104251
Full Text :
https://doi.org/10.1016/j.scitotenv.2019.06.320