Back to Search
Start Over
Selection of statistical technique for imputation of single site-univariate and multisite–multivariate methods for particulate pollutants time series data with long gaps and high missing percentage.
- Source :
- Environmental Science & Pollution Research; Jun2023, Vol. 30 Issue 30, p75469-75488, 20p
- Publication Year :
- 2023
-
Abstract
- Monitoring air contaminants has become essential to exposure science, toxicology, and public health research. However, missing values are common while monitoring air contaminants, especially in resource-constrained settings such as power cuts, calibration, and sensor failure. In contaminants monitoring, evaluating existing imputation techniques for dealing with recurrent periods of missing and unobserved data are limited. The proposed study aims to perform a statistical evaluation of six univariate and four multivariate time series imputation methods. The univariate methods are based on inter-time correlation characteristics, and the multivariate approach considers muti-site to impute missing data. The present study retrieved data from 38 ground-based monitoring stations for particulate pollutants in Delhi for 4 years. For univariate methods, missing values were simulated under 0–20% (5%, 10%, 15%, and 20%), and high 40%, 60%, and 80% missing levels having long gaps. Before evaluating multivariate methods, input data underwent pre-processing steps: selecting the target station to be imputed, choosing covariates based on the spatial correlation between multiple sites, and framing a combination of target and neighbouring stations (covariates) under 20%, 40%, 60%, and 80%. Next, the particulate pollutants data of 1480 days is provided as input to four multivariate techniques. Finally, the performance of each algorithm was evaluated using error metrics. The results show that the long interval time series data and spatial correlation of multiple stations significantly improved outcomes for univariate and multivariate time series methods. The univariate Kalman_arima performs well for long-missing gaps and all missing levels (except for 60–80%), yielding low error and high R<superscript>2</superscript> and d values. In contrast, multivariate MIPCA performed better than Kalman-arima for all target stations with the highest missing percentage. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 09441344
- Volume :
- 30
- Issue :
- 30
- Database :
- Complementary Index
- Journal :
- Environmental Science & Pollution Research
- Publication Type :
- Academic Journal
- Accession number :
- 164551095
- Full Text :
- https://doi.org/10.1007/s11356-023-27659-x