Back to Search
Start Over
Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data
- Source :
- PLoS ONE, PLoS ONE, Vol 17, Iss 1, p e0262131 (2022)
- Publication Year :
- 2022
- Publisher :
- Public Library of Science, 2022.
-
Abstract
- A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR). IBFI utilizes the feature importance and iteratively imputes missing values using any base learning algorithm. For this work, IBFI is tested onsoil radon gas concentration (SRGC)data. XGBoost is used as the learning algorithm and missing data are simulated using R for different missingness scenarios. IBFI is based on the physically meaningful assumption that SRGC depends upon environmental parameters such as temperature and relative humidity. This assumption leads to a model obtained from the complete multivariate series where the controls are available by taking the attribute of interest as a response variable. IBFI is tested against other frequently used imputation methods, namely mean, median, mode, predictive mean matching (PMM), and hot-deck procedures. The performance of the different imputation methods was assessed using root mean squared error (RMSE), mean squared log error (MSLE), mean absolute percentage error (MAPE), percent bias (PB), and mean squared error (MSE) statistics. The imputation process requires more attention when multiple variables are missing in different samples, resulting in challenges to machine learning methods because some controls are missing. IBFI appears to have an advantage in such circumstances. For testing IBFI, Radon Time Series Data (RTS) has been used and data was collected from1stMarch 2017to the11thof May 2018, including4seismic activities that have taken place during the data collection time.
- Subjects :
- Computer and Information Sciences
Atmospheric Science
Time Factors
Science
Research and Analysis Methods
Machine Learning
Machine Learning Algorithms
Soil
Meteorology
Mathematical and Statistical Techniques
Artificial Intelligence
Pakistan
Statistical Methods
Statistical Data
Multidisciplinary
Applied Mathematics
Simulation and Modeling
Statistics
Humidity
Chemistry
Radon
Research Design
Physical Sciences
Earth Sciences
Medicine
Mathematical Functions
Mathematics
Algorithms
Research Article
Chemical Elements
Forecasting
Subjects
Details
- Language :
- English
- ISSN :
- 19326203
- Volume :
- 17
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- PLoS ONE
- Accession number :
- edsair.doi.dedup.....cfe60c54a01fc8c3366ed526bd540bbe