Start Over

Mutual Information Input Selector and Probabilistic Machine Learning Utilisation for Air Pollution Proxies

Authors :: H. Al-Jeelani
Antti Hyvärinen
Lubna Dada
Heikki Lihavainen
Tareq Hussein
Mansour A. Alghamdi
Martha A. Zaidan
Global Atmosphere-Earth surface feedbacks
INAR Physics
Air quality research group
Department of Physics
Source :: Applied Sciences, Applied Sciences, Vol 9, Iss 20, p 4475 (2019), Volume 9, Issue 20
Publication Year :: 2019
Abstract: An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as the operators involved (i.e., virtual sensor). In this paper, we present a generic concept of pollutant proxy development based on an optimised data-driven approach. We propose a mutual information concept to determine the interdependence of different variables and thus select the most correlated inputs. The most relevant variables are selected to be the best proxy inputs, where several metrics and data loss are also involved for guidance. The input selection method determines the used data for training pollutant proxies based on a probabilistic machine learning method. In particular, we use a Bayesian neural network that naturally prevents overfitting and provides confidence intervals around its output prediction. In this way, the prediction uncertainty could be assessed and evaluated. In order to demonstrate the effectiveness of our approach, we test it on an extensive air pollution database to estimate ozone concentration. An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as the operators involved (i.e., virtual sensor). In this paper, we present a generic concept of pollutant proxy development based on an optimised data-driven approach. We propose a mutual information concept to determine the interdependence of different variables and thus select the most correlated inputs. The most relevant variables are selected to be the best proxy inputs, where several metrics and data loss are also involved for guidance. The input selection method determines the used data for training pollutant proxies based on a probabilistic machine learning method. In particular, we use a Bayesian neural network that naturally prevents overfitting and provides confidence intervals around its output prediction. In this way, the prediction uncertainty could be assessed and evaluated. In order to demonstrate the effectiveness of our approach, we test it on an extensive air pollution database to estimate ozone concentration.

Subjects :: 010504 meteorology & atmospheric sciences
Computer science
air pollution
Air pollution
Data loss
010501 environmental sciences
Overfitting
medicine.disease_cause
Machine learning
computer.software_genre
lcsh:Technology
01 natural sciences
114 Physical sciences
probabilistic machine learning
lcsh:Chemistry
medicine
General Materials Science
mutual information
Proxy (statistics)
ozone proxy
lcsh:QH301-705.5
Instrumentation
1172 Environmental sciences
0105 earth and related environmental sciences
Fluid Flow and Transfer Processes
Pollutant
lcsh:T
business.industry
Process Chemistry and Technology
213 Electronic, automation and communications engineering, electronics
General Engineering
Probabilistic logic
Mutual information
Missing data
lcsh:QC1-999
Computer Science Applications
lcsh:Biology (General)
lcsh:QD1-999
lcsh:TA1-2040
Artificial intelligence
lcsh:Engineering (General). Civil engineering (General)
business
computer
lcsh:Physics

Details

ISSN :: 20763417
Database :: OpenAIRE
Journal :: Applied Sciences
Accession number :: edsair.doi.dedup.....2b071247943990af1c440a76038f2f2f
Full Text :: https://doi.org/10.3390/app9204475

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Mutual Information Input Selector and Probabilistic Machine Learning Utilisation for Air Pollution Proxies

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Mutual Information Input Selector and Probabilistic Machine Learning Utilisation for Air Pollution Proxies

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources