An offline-coupled model (WRF/Polyphemus) and an online-coupled model (WRF/Chem-MADRID) are applied to simulate air quality in July 2001 at horizontal grid resolutions of 0.5° and 0.125° over Western Europe. The model performance is evaluated against available surface and satellite observations. The two models simulate different concentrations in terms of domainwide performance statistics, spatial distribution, temporal variations, and column abundance. WRF/Chem-MADRID at 0.5° gives higher values than WRF/Polyphemus for the domainwide mean and over polluted regions in Central and southern Europe for all surface concentrations and column variables except for the tropospheric ozone residual (TOR). Compared with observations, WRF/Polyphemus gives better statistical performance for daily HNO3, SO2, and NO2 at the European Monitoring and Evaluation Programme (EMEP) sites, maximum 1 h O3 at the AirBase sites, PM2.5 at the AirBase sites, maximum 8 h O3 and PM10 composition at all sites, column abundance of CO, NO2, TOR, and aerosol optical depth (AOD), whereas WRF/Chem-MADRID gives better statistical performance for NH3, hourly SO2, NO2, and O3 at the AirBase and BDQA (Base de données de la qualité de l'air) sites, maximum 1 h O3 at the BDQA and EMEP sites, and PM10 at all sites. WRF/Chem-MADRID generally reproduces well the observed high hourly concentrations of SO2 and NO2 at most sites except for extremely high episodes at a few sites, and WRF/Polyphemus performs well for hourly SO2 concentrations at most rural or background sites where pollutant levels are relatively low, but it underpredicts the observed hourly NO2 concentrations at most sites. Both models generally capture well the daytime maximum 8 h O3 concentrations and diurnal variations of O3 with more accurate peak daytime and minimal nighttime values by WRF/Chem-MADRID, but neither model reproduces extremely low nighttime O3 concentrations at several urban and suburban sites due to underpredictions of NOx and thus insufficient titration of O3 at night. WRF/Polyphemus gives more accurate concentrations of PM2.5, and WRF/Chem-MADRID reproduces better the observations of PM10 concentrations at all sites. The differences between model predictions and observations are mostly caused by inaccurate representations of emissions of gaseous precursors and primary PM species, as well as biases in the meteorological predictions. The differences in model predictions are caused by differences in the heights of the first model layers and thickness of each layer that affect vertical distributions of emissions, model treatments such as dry/wet deposition, heterogeneous chemistry, and aerosol and cloud, as well as model inputs such as emissions of soil dust and sea salt and chemical boundary conditions of CO and O3 used in both models. WRF/Chem-MADRID shows a higher sensitivity to grid resolution than WRF/Polyphemus at all sites. For both models, the use of a finer grid resolution generally leads to an overall better statistical performance for most variables, with greater spatial details and an overall better agreement in temporal variations and magnitudes at most sites. The use of online biogenic volatile organic compound (BVOC) emissions gives better statistical performance for hourly and maximum 8 h O3 and PM2.5 and generally better agreement with their observed temporal variations at most sites. Because it is an online model, WRF/Chem-MADRID offers the advantage of accounting for various feedbacks between meteorology and chemical species. However, this model comparison suggests that atmospheric pollutant concentrations are most sensitive in state-of-the-science air quality models to vertical structure, inputs, and parameterizations for dry/wet removal of gases and particles in the model.