1. Random, de novo, and conserved proteins: How structure and disorder predictors perform differently.
- Author
-
Middendorf L and Eicholt LA
- Subjects
- Animals, Conserved Sequence, Drosophila Proteins chemistry, Drosophila Proteins metabolism, Databases, Protein, Models, Molecular, Computational Biology methods, Proteins chemistry, Proteins metabolism, Intrinsically Disordered Proteins chemistry, Intrinsically Disordered Proteins metabolism, Protein Conformation, Amino Acid Sequence, Algorithms, Drosophila chemistry, Protein Folding
- Abstract
Understanding the emergence and structural characteristics of de novo and random proteins is crucial for unraveling protein evolution and designing novel enzymes. However, experimental determination of their structures remains challenging. Recent advancements in protein structure prediction, particularly with AlphaFold2 (AF2), have expanded our knowledge of protein structures, but their applicability to de novo and random proteins is unclear. In this study, we investigate the structural predictions and confidence scores of AF2 and protein language model-based predictor ESMFold for de novo and conserved proteins from Drosophila and a dataset of comparable random proteins. We find that the structural predictions for de novo and random proteins differ significantly from conserved proteins. Interestingly, a positive correlation between disorder and confidence scores (pLDDT) is observed for de novo and random proteins, in contrast to the negative correlation observed for conserved proteins. Furthermore, the performance of structure predictors for de novo and random proteins is hampered by the lack of sequence identity. We also observe fluctuating median predicted disorder among different sequence length quartiles for random proteins, suggesting an influence of sequence length on disorder predictions. In conclusion, while structure predictors provide initial insights into the structural composition of de novo and random proteins, their accuracy and applicability to such proteins remain limited. Experimental determination of their structures is necessary for a comprehensive understanding. The positive correlation between disorder and pLDDT could imply a potential for conditional folding and transient binding interactions of de novo and random proteins., (© 2024 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals LLC.)
- Published
- 2024
- Full Text
- View/download PDF