Gao, Ziqi, Wang, Yifeng, Vasilakos, Petros, Ivey, Cesunica E., Do, Khanh, and Russell, Armistead Goode
The growing abundance of data is conducive to using numerical methods to relate air quality, meteorology, and emissions to address which factors impact pollutant concentrations. Often, it is the extreme values that are of interest for health and regulatory purposes (e.g., the National Ambient Air Quality Standard for ozone uses the annual, maximum, daily 4th highest, 8-hour average (MDA8) ozone), though such values are the most challenging to predict using empirical models. We developed four different computational models, including the Generalized Additive Model (GAM), the Multivariate Adaptive Regression Splines, the Random Forest, and the Support Vector Regression, to develop observation-based relationships between the 4th highest MDA8 ozone in the South Coast Air Basin and precursor emissions, meteorological factors, and large-scale climate patterns. All models had similar predictive performance, though the GAM showed a relatively higher R2 value (0.96) with a lower root mean square error and mean bias.