1. Using Diverse Data Sources to Impute Missing Air Quality Data Collected in a Resource-Limited Setting
- Author
-
Moses Mogakolodi Kebalepile, Loveness Nyaradzo Dzikiti, and Kuku Voyi
- Subjects
MICE imputation ,air quality ,missing data ,classification and regression trees ,Meteorology. Climatology ,QC851-999 - Abstract
The sustainable operation of ambient air quality monitoring stations in developing countries is not always possible. Intermittent failures and breakdowns at air quality monitoring stations often affect the continuous measurement of data as required. These failures and breakdowns result in missing data. This study aimed to impute NO2, SO2, O3, and PM 10 to produce complete data sets of daily average exposures from 2010 to 2017. Models were built for (a) an individual pollutant at a monitoring station, (b) a combined model for the same pollutant from different stations, and (c) a data set with all the pollutants from all the monitoring stations. This study sought to evaluate the efficacy of the Multiple Imputation by Chain Equations (MICE) algorithm in successfully imputing air quality data that are missing at random. The application of classification and regression trees (CART) analysis using the MICE package in the R statistical programming language was compared with the predictive mean matching (PMM) method. The CART method performed better, with the pooled R-squared statistics of the imputed data ranging from 0.3 to 0.7, compared to a range of 0.02 to 0.25 for PMM. The MICE algorithm successfully resolved the incompleteness of the data. It was concluded that the CART method produced better reliable data than the PMM method. However, in this study, the pooled R2 values were accurate for NO2, but not so much for other pollutants.
- Published
- 2024
- Full Text
- View/download PDF