Back to Search Start Over

SVM Learning from Imbalanced Data by GA Sampling for Protein Domain Prediction

Authors :
Jianxin Wang
Yanxin Huang
Yan Wang
Shuxue Zou
Chunguang Zhou
Source :
ICYCS
Publication Year :
2008
Publisher :
IEEE, 2008.

Abstract

The performance of support vector machines (SVM) drops significantly while facing imbalanced datasets, though it has been extensively studied and has shown remarkable success in many applications. Some researchers have pointed out that it is difficult to avoid such decrease when trying to improve the efficient of SVM on imbalanced datasets by modifying the algorithm itself only. Therefore, as the pretreatment of data, sampling is a popular strategy to handle the class imbalance problem since it re-balances the dataset directly. In this paper, we proposed a novel sampling method based on genetic algorithms (GA) to rebalance the imbalanced training dataset for SVM. In order to evaluating the final classifiers more impartiality, AUC (area under ROC curve) is employed as the fitness function in GA. The experimental results show that the sampling strategy based on GA outperforms the random sampling method. And our method is prior to individual SVM for imbalanced protein domain boundary prediction. The accuracy of the prediction is about 70% with the AUC value 0.905.

Details

Database :
OpenAIRE
Journal :
2008 The 9th International Conference for Young Computer Scientists
Accession number :
edsair.doi...........37136451ad083495a68dbb0d0d1733cb
Full Text :
https://doi.org/10.1109/icycs.2008.72