Back to Search
Start Over
Is replacing missing values of PM 2.5 constituents with estimates using machine learning better for source apportionment than exclusion or median replacement?
- Source :
-
Environmental pollution (Barking, Essex : 1987) [Environ Pollut] 2024 Aug 01; Vol. 354, pp. 124165. Date of Electronic Publication: 2024 May 16. - Publication Year :
- 2024
-
Abstract
- East Asian countries have been conducting source apportionment of fine particulate matter (PM <subscript>2.5</subscript> ) by applying positive matrix factorization (PMF) to hourly constituent concentrations. However, some of the constituent data from the supersites in South Korea was missing due to instrument maintenance and calibration. Conventional preprocessing of missing values, such as exclusion or median replacement, causes biases in the estimated source contributions by changing the PMF input. Machine learning (ML) can estimate the missing values by training on constituent data, meteorological data, and gaseous pollutants. Complete data from the Seoul Supersite in 2018 was taken, and a random 20% was set as missing. PMF was performed by replacing missing values with estimates. Percent errors of the source contributions were calculated compared to those estimated from complete data. Missing values were estimated using a random forest analysis. Estimation accuracy (r <superscript>2</superscript> ) was as high as 0.874 for missing carbon species and low at 0.631 when ionic species and trace elements were missing. For the seven highest contributing sources, replacing the missing values of carbon species with estimates minimized the percent errors to 2.0% on average. However, replacing the missing values of the other chemical species with estimates increased the percent errors to more than 9.7% on average. Percent errors were maximal at 37% on average when missing values of ionic species and trace elements were replaced with estimates. Missing values, except for carbon species, need to be excluded. This approach reduced the percent errors to 7.4% on average, which was lower than those due to median replacement. Our results show that reducing the biases in source apportionment is possible by replacing the missing values of carbon species with estimates. To improve the biases due to missing values of the other chemical species, the estimation accuracy of the ML needs to be improved.<br />Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.<br /> (Copyright © 2024 The Authors. Published by Elsevier Ltd.. All rights reserved.)
Details
- Language :
- English
- ISSN :
- 1873-6424
- Volume :
- 354
- Database :
- MEDLINE
- Journal :
- Environmental pollution (Barking, Essex : 1987)
- Publication Type :
- Academic Journal
- Accession number :
- 38759749
- Full Text :
- https://doi.org/10.1016/j.envpol.2024.124165