Back to Search
Start Over
Protein Crystallization Identification via Fuzzy Model on Linear Neighborhood Representation
- Source :
- IEEE/ACM Transactions on Computational Biology and Bioinformatics. 18:1986-1995
- Publication Year :
- 2021
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2021.
-
Abstract
- X-ray crystallography is the most popular approach for analyzing protein 3D structure. However, the success rate of protein crystallization is very low (2-10 percent). To reduce the cost of time and resources, lots of computation-based methods are developed to detect the protein crystallization. Improving the accuracy of predicting protein crystallization is very important for the determination of protein structure by X-ray crystallography. At present, many machine learning methods are used to predict protein crystallization. In this article, we propose a Fuzzy Support Vector Machine based on Linear Neighborhood Representation (FSVM-LNR) to predict the crystallization propensity of proteins. Proteins are represented by three types of features (PsePSSM, PSSM-DWT, MMI-PS), and these features are serially combined and fed into FSVM-LNR. FSVM-LNR can filter outliers by membership score, which is calculated via reconstruction residuals of $k$ k nearest samples. To evaluate the performance of our predictive model, we test FSVM-LNR on the datasets of TRAIN3587, TEST3585 and TEST500. Our method achieves better Mathew’s correlation coefficient (MCC) on TRAIN3587 (MCC: 0.56) and TEST3585 (MCC: 0.58). Although the performance of independent test is not the best on TEST500, FSVM-LNR also has a certain predictability (MCC: 0.70) in the identification of protein crystallization. The good performance on the datasets proves the effectiveness of our method and the better performance on large datasets further demonstrates the stability and superiority of our method.
- Subjects :
- Support Vector Machine
Computer science
0206 medical engineering
Feature extraction
Stability (learning theory)
02 engineering and technology
law.invention
Fuzzy Logic
law
Genetics
Crystallization
Representation (mathematics)
business.industry
Applied Mathematics
Computational Biology
Proteins
Pattern recognition
Filter (signal processing)
Support vector machine
Outlier
Artificial intelligence
Protein crystallization
business
Algorithms
020602 bioinformatics
Biotechnology
Subjects
Details
- ISSN :
- 23740043 and 15455963
- Volume :
- 18
- Database :
- OpenAIRE
- Journal :
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
- Accession number :
- edsair.doi.dedup.....febaf4dd69751595b151b65903e33807