Back to Search Start Over

Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation

Authors :
A. M. Aleesa
A.H. Alamoodi
B. B. Zaidan
Juliana Chen
M. A. Chyad
Osamah Shihab Albahri
A. A. Zaidan
Salem Garfan
Source :
Chaos, Solitons & Fractals. 151:111236
Publication Year :
2021
Publisher :
Elsevier BV, 2021.

Abstract

Missing data is a common problem in real-world data sets and it is amongst the most complex topics in computer science and many other research domains. The common ways to cope with missing values are either by elimination or imputation depending of the volume of the missing data and its distribution nature. It becomes imperative to come up with new imputation approaches along with efficient algorithms. Though most existing imputation methods focus on a moderate amount of missing data, imputation for high missing rates over 80% is still important but challenging. Even with the existence of some works in addressing high missing volume issue, they mostly rely on imputing reference dataset (Complete Datasets for evaluation) after they create artificial missing values and impute it to measure the accuracy of their proposed techniques. So far, the option of imputing high proportions of missing values with no reference comparison dataset (Original Dataset with highly missing values) have been often ignored or overlooked. Therefore, we propose a missing data imputation approach for high volumes of missing values with no reference comparison dataset. The approach makes use of pre-processing measures and breaking the dataset into small continuous non-missing portions then using Multi Criteria Decision-making analysis to select a portion of data which is representative of the entire broken datasets. This portion helps to create reference comparisons and expands the missing dataset through artificial missing-making procedures with different percentages and imputation using different machine learning techniques. This study conducted two experiments using BMI datasets with more than 80% of missing values, derived from the National Child Development Centre (NCDRC) at Sultan Idris Education University (UPSI), Malaysia. The results show that our approach capability in reconstructing datasets with huge missing values.

Details

ISSN :
09600779
Volume :
151
Database :
OpenAIRE
Journal :
Chaos, Solitons & Fractals
Accession number :
edsair.doi...........34cd81f07d836be6f69c2721a8af64a4
Full Text :
https://doi.org/10.1016/j.chaos.2021.111236