Back to Search Start Over

Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC.

Authors :
Zhang, Shengli
Duan, Xin
Source :
Journal of Theoretical Biology. Jan2018, Vol. 437, p239-250. 12p.
Publication Year :
2018

Abstract

Predicting protein subcellular location with support vector machine has been a popular research area recently because of the dramatic explosion of bioinformation. Though substantial achievements have been obtained, few researchers considered the problem of data imbalance before classification, which will lead to low accuracy for some categories. So in this work, we combined oversampling method with SVM to deal with the protein subcellular localization of unbalanced data sets. To capture valuable information of a protein, a PseAAC (Pseudo Amino Acid Composition) has been extracted from PSSM(Position-Specific Scoring Matrix) as a feature vector, and then be selected by principal component analysis (PCA). Next, samples which are treated by oversampling method to eliminate the imbalance of sample numbers in different classes are fed into support vector machine to predict the protein subcellular location. To evaluate the performance of proposed method, Jackknife tests are performed on three benchmark datasets (ZD98, CL317 and ZW225). Results of SVM experiments with and without oversampling gained by Jackknife tests show that oversampling methods have successfully decrease the imbalance of data sets, and the prediction accuracy of each class in each dataset is higher than 88.9%. With comparison with other protein subcellular localization methods, the method in this work reaches the best performance. The overall accuracies of ZD98, CL317 and ZW225 are 93.2%, 96.00% and 92.15% respectively, which are 2.4%, 8.0% and 8.2% higher than the best methods in the comparison. The excellent overall accuracy gained by the proposed method indicates that the feature representation and selection capture useful information of protein sequence and oversampling methods successfully solve the imbalance of sample numbers in SVM classification. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00225193
Volume :
437
Database :
Academic Search Index
Journal :
Journal of Theoretical Biology
Publication Type :
Academic Journal
Accession number :
126253779
Full Text :
https://doi.org/10.1016/j.jtbi.2017.10.030