Back to Search Start Over

Selection of statistical technique for imputation of single site-univariate and multisite–multivariate methods for particulate pollutants time series data with long gaps and high missing percentage.

Authors :
K, Priti
Shakya, Kaushlesh Singh
Kumar, Prashant
Source :
Environmental Science & Pollution Research; Jun2023, Vol. 30 Issue 30, p75469-75488, 20p
Publication Year :
2023

Abstract

Monitoring air contaminants has become essential to exposure science, toxicology, and public health research. However, missing values are common while monitoring air contaminants, especially in resource-constrained settings such as power cuts, calibration, and sensor failure. In contaminants monitoring, evaluating existing imputation techniques for dealing with recurrent periods of missing and unobserved data are limited. The proposed study aims to perform a statistical evaluation of six univariate and four multivariate time series imputation methods. The univariate methods are based on inter-time correlation characteristics, and the multivariate approach considers muti-site to impute missing data. The present study retrieved data from 38 ground-based monitoring stations for particulate pollutants in Delhi for 4 years. For univariate methods, missing values were simulated under 0–20% (5%, 10%, 15%, and 20%), and high 40%, 60%, and 80% missing levels having long gaps. Before evaluating multivariate methods, input data underwent pre-processing steps: selecting the target station to be imputed, choosing covariates based on the spatial correlation between multiple sites, and framing a combination of target and neighbouring stations (covariates) under 20%, 40%, 60%, and 80%. Next, the particulate pollutants data of 1480 days is provided as input to four multivariate techniques. Finally, the performance of each algorithm was evaluated using error metrics. The results show that the long interval time series data and spatial correlation of multiple stations significantly improved outcomes for univariate and multivariate time series methods. The univariate Kalman_arima performs well for long-missing gaps and all missing levels (except for 60–80%), yielding low error and high R<superscript>2</superscript> and d values. In contrast, multivariate MIPCA performed better than Kalman-arima for all target stations with the highest missing percentage. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09441344
Volume :
30
Issue :
30
Database :
Complementary Index
Journal :
Environmental Science & Pollution Research
Publication Type :
Academic Journal
Accession number :
164551095
Full Text :
https://doi.org/10.1007/s11356-023-27659-x