Start Over

A feature selection and multi-model fusion-based approach of predicting air quality.

Authors :: Zhang, Ying
Zhang, Rongrong
Ma, Qunfei
Wang, Yanhao
Wang, Qingqing
Huang, Zihao
Huang, Linyan
Source :: ISA Transactions; May2020, Vol. 100, p210-220, 11p
Publication Year :: 2020
Abstract: With the rapid development of China's industrialization, the air pollution is becoming more and more serious. It is vital for us to predict the air quality for determining the further prevention measures of avoiding the brought disasters. In this paper, we are going to propose an approach of predicting the air quality based on the multiple data features through fusing the multiple machine learning models. The approach takes the meteorological data and air quality data for the past six days as one batch of input (the whole data set is for 46 days) and employs a multi-model fusion to provide an improved 24-hour prediction of PM2.5 pollutant concentration all over Beijing. During the above process, two focal feature groups are composed. The first focal feature group contains the historical meteorological data, while the second group includes the statistical information, the date information and the polynomial variations. Besides the two groups, we complement one million more data items by employing the time sliding means. Among the supplementary data, we select the most critical 500 features with Light Gradient Boosting Machine (LightGBM) model and send the features as the input to Gradient Boosting Decision Tree (GBDT) and LightGBM models. Meanwhile, we screen the most critical 300 features with eXtreme Gradient Boosting (XGBoost) model and send them as the input to the three prediction models. Referring to each of the models, we respectively gain the optimal parameters through grid search methods and then fuse the models' contribution with the linear weighting. The experiments indicate that the proposed approach based on the weighting fusion is better than that provided by a single modeling scheme, and the loss value is 0.4158 under the SMAPE index. • An approach of predicting air quality with fusing the multiple machine learning models. • The machine learning solutions to model PM2.5 pollutant concentration distribution feature. • The time sliding means for developing one million more data items. • The fusion of multiple models under linear weighting strategy. The experiments indicate that the approach based on the weighting fusion is better than that provided by the single modeling scheme, and the loss value is 0.4158 under the SMAPE index. [ABSTRACT FROM AUTHOR]

Subjects :: AIR quality
STATISTICS
AIR pollution
FEATURE selection
MACHINE learning
DECISION trees
AIR bases

Details

Language :: English
ISSN :: 00190578
Volume :: 100
Database :: Supplemental Index
Journal :: ISA Transactions
Publication Type :: Academic Journal
Accession number :: 143158985
Full Text :: https://doi.org/10.1016/j.isatra.2019.11.023

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A feature selection and multi-model fusion-based approach of predicting air quality.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A feature selection and multi-model fusion-based approach of predicting air quality.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources