Back to Search
Start Over
SVM Learning from Imbalanced Data by GA Sampling for Protein Domain Prediction
- Source :
- ICYCS
- Publication Year :
- 2008
- Publisher :
- IEEE, 2008.
-
Abstract
- The performance of support vector machines (SVM) drops significantly while facing imbalanced datasets, though it has been extensively studied and has shown remarkable success in many applications. Some researchers have pointed out that it is difficult to avoid such decrease when trying to improve the efficient of SVM on imbalanced datasets by modifying the algorithm itself only. Therefore, as the pretreatment of data, sampling is a popular strategy to handle the class imbalance problem since it re-balances the dataset directly. In this paper, we proposed a novel sampling method based on genetic algorithms (GA) to rebalance the imbalanced training dataset for SVM. In order to evaluating the final classifiers more impartiality, AUC (area under ROC curve) is employed as the fitness function in GA. The experimental results show that the sampling strategy based on GA outperforms the random sampling method. And our method is prior to individual SVM for imbalanced protein domain boundary prediction. The accuracy of the prediction is about 70% with the AUC value 0.905.
- Subjects :
- Fitness function
Computer science
business.industry
Boundary (topology)
Sampling (statistics)
Pattern recognition
Machine learning
computer.software_genre
Imbalanced data
Support vector machine
Class imbalance
ComputingMethodologies_PATTERNRECOGNITION
Genetic algorithm
Artificial intelligence
business
computer
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2008 The 9th International Conference for Young Computer Scientists
- Accession number :
- edsair.doi...........37136451ad083495a68dbb0d0d1733cb
- Full Text :
- https://doi.org/10.1109/icycs.2008.72