Back to Search Start Over

Appraisal of machine learning techniques for predicting emerging disinfection byproducts in small water distribution networks.

Authors :
Hu, Guangji
Mian, Haroon R.
Mohammadiun, Saeed
Rodriguez, Manuel J.
Hewage, Kasun
Sadiq, Rehan
Source :
Journal of Hazardous Materials. Mar2023, Vol. 446, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

Monitoring emerging disinfection byproducts (DBPs) is challenging for many small water distribution networks (SWDNs), and machine learning-based predictive modeling could be an alternative solution. In this study, eleven machine learning techniques, including three multivariate linear regression-based, three regression tree-based, three neural networks-based, and two advanced non-parametric regression techniques, are used to develop models for predicting three emerging DBPs (dichloroacetonitrile, chloropicrin, and trichloropropanone) in SWDNs. Predictors of the models include commonly-measured water quality parameters and two conventional DBP groups. Sampling data of 141 cases were collected from eleven SWDNs in Canada, in which 70 % were randomly selected for model training and the rest were used for validation. The modeling process was reiterated 1000 times for each model. The results show that models developed using advanced regression techniques, including support vector regression and Gaussian process regression, exhibited the best prediction performance. Support vector regression models showed the highest prediction accuracy (R2 = 0.94) and stability for predicting dichloroacetonitrile and trichloropropanone, and Gaussian process regression models are optimal for predicting chloropicrin (R2 = 0.92). The difference is likely due to the much lower concentrations of chloropicrin than dichloroacetonitrile and trichloropropanone. Advanced non-parametric regression techniques, characterized by a probabilistic nature, were identified as most suitable for developing the predictive models, followed by neural network-based (e.g., generalized regression neural network), regression tree-based (e.g., random forest), and multivariate linear regression-based techniques. This study identifies promising machine learning techniques among many commonly-used alternatives for monitoring emerging DBPs in SWDNs under data constraints. [Display omitted] • Eleven machine learning techniques were compared for predicting emerging DBPs. • Support vector regression and Gaussian process regression are best-performing. • Support vector regression reduced 90 % error compared to linear regression. • Gaussian process regression reduced 67 % error compared to linear regression. • The two advanced regression techniques also showed the lowest prediction variance. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03043894
Volume :
446
Database :
Academic Search Index
Journal :
Journal of Hazardous Materials
Publication Type :
Academic Journal
Accession number :
161440587
Full Text :
https://doi.org/10.1016/j.jhazmat.2022.130633