Back to Search Start Over

Semi-supervised prediction of protein interaction sites from unlabeled sample information

Authors :
Ye Wang
Changqing Mei
Yuming Zhou
Yan Wang
Chunhou Zheng
Xiao Zhen
Yan Xiong
Peng Chen
Jun Zhang
Bing Wang
Source :
BMC Bioinformatics, Vol 20, Iss S25, Pp 1-10 (2019)
Publication Year :
2019
Publisher :
BMC, 2019.

Abstract

Abstract Background The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today. Results In this work, three semi-supervised support vector machine–based methods are proposed to improve the performance in the protein interaction sites prediction, in which the information of unlabeled protein sites can be involved. Herein, five features related with the evolutionary conservation of amino acids are extracted from HSSP database and Consurf Sever, i.e., residue spatial sequence spectrum, residue sequence information entropy and relative entropy, residue sequence conserved weight and residual Base evolution rate, to represent the residues within the protein sequence. Then three predictors are built for identifying the interface residues from protein surface using three types of semi-supervised support vector machine algorithms. Conclusion The experimental results demonstrated that the semi-supervised approaches can effectively improve prediction performance of protein interaction sites when unlabeled information is involved into the predictors and one of them can achieve the best prediction performance, i.e., the accuracy of 70.7%, the sensitivity of 62.67% and the specificity of 78.72%, respectively. With comparison to the existing studies, the semi-supervised models show the improvement of the predication performance.

Details

Language :
English
ISSN :
14712105
Volume :
20
Issue :
S25
Database :
Directory of Open Access Journals
Journal :
BMC Bioinformatics
Publication Type :
Academic Journal
Accession number :
edsdoj.76633a40be432fab34aacb42c94f30
Document Type :
article
Full Text :
https://doi.org/10.1186/s12859-019-3274-7