Hichem Tahraoui, Abd-Elmouneïm Belhadj, Abdeltif Amrane, Essam H. Houssein, Université Yahia Fares de Médéa, Institut des Sciences Chimiques de Rennes (ISCR), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Ecole Nationale Supérieure de Chimie de Rennes (ENSCR)-Institut de Chimie du CNRS (INC)-Centre National de la Recherche Scientifique (CNRS), and Minia University
International audience; Continuous water monitoring is expensive and time consuming. Because it requires sampling information throughout 12 months and restricts the conduct of water aid management studies as well as the calibration and validation of excellent water models. To overcome this obstacle to better water quality management, improving water quality models is a necessary step. Various modelling strategies have been developed in recent years to improve the accuracy of predictions of major water parameters. In this work, for the prediction of raw water sulfate, we used five machine learning models were considered in this work: artificial neural network (ANN), support vector machine (SVM), Gaussian process regression (GPR), and decision tree (DT) and ensemble tree (ET). Moreover, the DT model was used to know the influence of the other physicochemical parameters (inputs) on the, and the ET model to improve the DT result and ensure the influence of the other physicochemical parameters on the sulfate. The experimental results indicate that all models were found to be effective in predicting sulfate levels, due to their very high correlation coefficients (close to 1) and very low statistical errors (close to 0); however, the most suitable water quality models were GPR and ANN, as their coefficients and statistical indicators do not show much difference between them. Indeed, the coefficients and the statistical indicators of the GPR model were R = 0.9991, R-2 = 0.9982, R-2 adj = 0.9978, RMSE = 0.0182, MSE = 0, 0003. MAE = 0.0073 and EPM = 1.5386; while those of the ANN model were: R = 0.9989, R-2 = 0.9978, R-a(dj)2 = 0.9972, RMSE = 0.0124, MSE = 0.0001, MAE = 0.0083 and EPM = 2.0639. The only difference that favored the GPR model if compared to the ANN was the number of parameters, namely 70 parameters and a very weak loss, 3.3404e-04. In contrast, the ANN model was run with 190 parameters. The model tests (interpolation) confirmed this result, owing to the values of the the correlation coefficient (R = 0.99834) and the coefficient of determination (R-2 = 0.9966), as well as that of statistical indicators (RMSE = 0.0309, MSE = 9.5219e-04, EPM = 3.0267 and MAE = 0.0122). In light of these results it can be concluded that the GPR model is the more efficient to predict sulfate in raw water. Additionally, its ability to deal with missing values, outliers, and the updating ability shows its relevance, which should be kept in the future. This efficiency seems to be due to the fact that the sulfate concentration in the raw water is linked to the physico-chemical characteristics of the environment by non-linear relationships. It is confirmed by a tree and ensemble model decision which provided information on how sulfate reacts with other physicochemical characteristics.