1. Predicting protein β-sheet contacts using a maximum entropy-based correlated mutation measure.
- Author
-
Burkoff, Nikolas S., Várnai, Csilla, and Wild, David L.
- Subjects
- *
GENETIC mutation , *ENTROPY , *PROTEIN folding , *COMPUTATIONAL biology , *NUCLEOTIDE sequence , *MACHINE learning - Abstract
Motivation: The problem of ab initio protein folding is one of the most difficult in modern computational biology. The prediction of residue contacts within a protein provides a more tractable immediate step. Recently introduced maximum entropy-based correlated mutation measures (CMMs), such as direct information, have been successful in predicting residue contacts. However, most correlated mutation studies focus on proteins that have large good-quality multiple sequence alignments (MSA) because the power of correlated mutation analysis falls as the size of the MSA decreases. However, even with small autogenerated MSAs, maximum entropy-based CMMs contain information. To make use of this information, in this article, we focus not on general residue contacts but contacts between residues in β-sheets. The strong constraints and prior knowledge associated with β-contacts are ideally suited for prediction using a method that incorporates an often noisy CMM.Results: Using contrastive divergence, a statistical machine learning technique, we have calculated a maximum entropy-based CMM. We have integrated this measure with a new probabilistic model for β-contact prediction, which is used to predict both residue- and strand-level contacts. Using our model on a standard non-redundant dataset, we significantly outperform a 2D recurrent neural network architecture, achieving a 5% improvement in true positives at the 5% false-positive rate at the residue level. At the strand level, our approach is competitive with the state-of-the-art single methods achieving precision of 61.0% and recall of 55.4%, while not requiring residue solvent accessibility as an input.Availability: http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/Contact: D.L.Wild@warwick.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF