Start Over

KNN weighted reduced universum twin SVM for class imbalance learning.

Authors :: Ganaie, M.A.
Tanveer, M.
Source :: Knowledge-Based Systems. Jun2022, Vol. 245, pN.PAG-N.PAG. 1p.
Publication Year :: 2022
Abstract: In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. The classification models like support vector machines (SVM) get biased to majority class samples and hence results in misclassification of the minority class samples. SVM suffers as no prior information related to the data is involved in the generation of hyperplanes. Also, local information of the neighbourhood is ignored in SVM samples and thus treats each sample equally for generating the hyperplanes. However, the data points may be contaminated and may mislead the generation of hyperplanes. Inspired by the idea of prior data information and local neighbourhood information, we propose K -nearest neighbour based weighted reduced universum twin SVM for class imbalance learning (KWRUTSVM-CIL). The proposed KWRUTSVM-CIL embodies the local neighbourhood information and uses universum data to balance the classes in class imbalance problems. Local neighbourhood information is incorporated via weight matrix in the objective function. In proposed KWRUTSVM-CIL model, weight vectors are used in the corresponding constraints of the objective functions to exploit the interclass information. The oversampling and undersampling approaches are followed to balance the data in class imbalance problems. Universum data gives prior information of the data. Twin SVM, universum twin SVM, and reduced universum twin SVM for class imbalance implement empirical risk minimization principle and thus may lead to overfitting. However, the proposed KWRUTSVM-CIL model embodies regularization term to maximize the margin and implement the structural risk minimization principle which is the marrow of statistical learning and overcomes the issues of overfitting. Experimental results and the statistical analysis signify that the generalization ability of proposed KWRUTSVM-CIL model is superior in comparison to other twin SVM based models. As an application, we use the proposed KWRUTSVM-CIL model for the diagnosis of Alzheimer's disease and breast cancer disease. The proposed KWRUTSVM-CIL model showed better generalization performance compared to other twin SVM based models in biomedical datasets. • To incorporate the local neighbourhood information, K nearest neighbourbased weights are used in the proposed KWRUTSVM-CIL. • Unlike RUTSVM-CIL, UTSVM, TSVM and FTWSVM models which implement the empirical risk minimization principle, the proposed KWRUTSVM-CIL model implements the structural risk minimization principle. • Similar to RUTSVM-CIL, the proposed KWRUTSVM-CIL model incorporates prior information about the data (universum data) to handle the class imbalance problem. • The matrices appearing in the Wolfe dual of the proposed KWRUTSVM-CIL are positive definite, while as the matrices in the Wolfe dual of RUTSVM-CIL, UTSVM, TSVM and FTWSVM are positive semi-definite. • Experimental results and statistical analysis show the efficacy of the proposed KWRUTSVM-CIL model. As an application, we use the proposed KWRUTSVM-CIL model for the classification of Alzheimer's disease and breast cancer subjects. [ABSTRACT FROM AUTHOR]