1. Missing Values Treatment and Feature Reduction Analysis to Enhance Classification
- Author
-
D. Muralidharan, K. Renuka, Mulagala Jaswant, J. Karthikeyan, G.R. Brindha, D. Muralidharan, K. Renuka, Mulagala Jaswant, J. Karthikeyan, and G.R. Brindha
- Abstract
Datasets may have large number of features which makes it hard and time consuming to classify. Additionally, they may have irrelevant and noise features too with missing values. The missing values should be treated in a proper way so that the classifier accuracy can be improved. There is also a need to reduce features and select only the features necessary to the classifier. Principal Component Analysis (PCA) is commonly considered for this process of reducing the number of features in a dataset. These reduced components can be applied as input to the classifiers. In this study, standard datasets are checked for missing values, classified using Support vector Machines (SVM) and Naive Bayes with and without reducing the features using PCA. Then, the proposed algorithm for missing value imputation is used on the datasets and the same analysis were carried out. The accuracy is evaluated using Confusion Matrix. The results are discussed with analysis based on the nature of features and missing values and how different datasets behave when used with machine learning algorithms.
- Published
- 2020