Back to Search Start Over

A Statistical Model for Automated Quality Assessment of the TOAR-II

Authors :
Martin G. Schultz
Sabine Schröder
Najmeh Kaffashzadeh
Kai-Lan Chang
Source :
EGU2020: Sharing Geoscience Online, #shareEGU20, Vienna, Austria, 2020-05-04-2020-05-08
Publication Year :
2020

Abstract

The Tropospheric Ozone Assessment Report, phase 2, (TOAR-II) database is a collection of global ground-level ozone in-situ measurements from various locations. It also holds data of selected ozone precursors and meteorological variables. TOAR-II assembles air quality data from many different sources and thus requires a common data quality assessment (QA) to ensure the data meet the quality required for globally consistent analyses. The large volume of this database (more than 100,000 data series) enforces the use of automated, data-driven QA procedures.Accordingly, we have developed a statistical model for automated QA. This model consists of several statistical tests that are classified into several sub-groups. In this model, a QA-score (an indicator ranging from 0 to 1) was assigned to each individual data point to estimates the value‘s plausibility. The foundation of this concept is statistical hypothesis testing and the probability theory. This model was implemented in a Python package and is called AutoQA4Env.One application of AutoQA4Env is the data ingestion workflow of TOAR-II. The tool generates a data quality report which is then sent back to the data provider for inspection. Since AutoQA4Env is easily configurable, it allows the users to set quality thresholds and thus filter data according to their use case. While we primarily develop AutoQA4Env for air quality data, the same concept and model might be applicable to other databases and the software framework is flexible enough to allow for other use cases.

Details

Language :
English
Database :
OpenAIRE
Journal :
EGU2020: Sharing Geoscience Online, #shareEGU20, Vienna, Austria, 2020-05-04-2020-05-08
Accession number :
edsair.doi.dedup.....2234d4e3de2471f1d8183b63910be868
Full Text :
https://doi.org/10.5194/egusphere-egu2020-13357