Back to Search Start Over

Single-sequence protein structure prediction using supervised transformer protein language models.

Authors :
Wang W
Peng Z
Yang J
Source :
Nature computational science [Nat Comput Sci] 2022 Dec; Vol. 2 (12), pp. 804-814. Date of Electronic Publication: 2022 Dec 19.
Publication Year :
2022

Abstract

Significant progress has been made in protein structure prediction in recent years. However, it remains challenging for AlphaFold2 and other deep learning-based methods to predict protein structure with single-sequence input. Here we introduce trRosettaX-Single, an automated algorithm for single-sequence protein structure prediction. It incorporates the sequence embedding from a supervised transformer protein language model into a multi-scale network enhanced by knowledge distillation to predict inter-residue two-dimensional geometry, which is then used to reconstruct three-dimensional structures via energy minimization. Benchmark tests show that trRosettaX-Single outperforms AlphaFold2 and RoseTTAFold on orphan proteins and works well on human-designed proteins (with an average template modeling score (TM-score) of 0.79). An experimental test shows that the full trRosettaX-Single pipeline is two times faster than AlphaFold2, using much fewer computing resources (<10%). On 2,000 designed proteins from network hallucination, trRosettaX-Single generates structure models with high confidence. As a demonstration, trRosettaX-Single is applied to missense mutation analysis. These data suggest that trRosettaX-Single may find potential applications in protein design and related studies.<br /> (© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.)

Details

Language :
English
ISSN :
2662-8457
Volume :
2
Issue :
12
Database :
MEDLINE
Journal :
Nature computational science
Publication Type :
Academic Journal
Accession number :
38177395
Full Text :
https://doi.org/10.1038/s43588-022-00373-3