Back to Search
Start Over
The intrinsic dimension of protein sequence evolution
- Source :
- PLoS Computational Biology, Vol 15, Iss 4, p e1006767 (2019), PLOS Computational Biology
- Publication Year :
- 2019
- Publisher :
- Public Library of Science (PLoS), 2019.
-
Abstract
- It is well known that, in order to preserve its structure and function, a protein cannot change its sequence at random, but only by mutations occurring preferentially at specific locations. We here investigate quantitatively the amount of variability that is allowed in protein sequence evolution, by computing the intrinsic dimension (ID) of the sequences belonging to a selection of protein families. The ID is a measure of the number of independent directions that evolution can take starting from a given sequence. We find that the ID is practically constant for sequences belonging to the same family, and moreover it is very similar in different families, with values ranging between 6 and 12. These values are significantly smaller than the raw number of amino acids, confirming the importance of correlations between mutations in different sites. However, we demonstrate that correlations are not sufficient to explain the small value of the ID we observe in protein families. Indeed, we show that the ID of a set of protein sequences generated by maximum entropy models, an approach in which correlations are accounted for, is typically significantly larger than the value observed in natural protein families. We further prove that a critical factor to reproduce the natural ID is to take into consideration the phylogeny of sequences.
- Subjects :
- 0301 basic medicine
Models, Molecular
Protein Folding
Protein Conformation
Sequence Homology
Amino Acid Sequence
Computational Biology
Databases, Protein
Mutation
Phylogeny
Proteins
Sequence Homology, Amino Acid
Structural Homology, Protein
Evolution, Molecular
co-evolution
0302 clinical medicine
Protein structure
Protein sequencing
Models
lcsh:QH301-705.5
Mathematics
protein evolution, intrinsic dimension, co-evolution, statistical inference
Sequence
Ecology
Amino Acid
Computational Theory and Mathematics
Modeling and Simulation
statistical inference
Protein family
Sequence analysis
Evolution
Protein domain
Sequence alignment
Settore FIS/03 - Fisica della Materia
03 medical and health sciences
Cellular and Molecular Neuroscience
Databases
Molecular evolution
Genetics
protein evolution
Molecular Biology
Ecology, Evolution, Behavior and Systematics
Structural Homology
intrinsic dimension
Protein
Molecular
030104 developmental biology
lcsh:Biology (General)
Evolutionary biology
030217 neurology & neurosurgery
Subjects
Details
- Language :
- English
- ISSN :
- 15537358
- Volume :
- 15
- Issue :
- 4
- Database :
- OpenAIRE
- Journal :
- PLoS Computational Biology
- Accession number :
- edsair.doi.dedup.....64b88c3e6137c2f6136bb947db86b62e