Back to Search
Start Over
Large language models generate functional protein sequences across diverse families.
- Source :
-
Nature Biotechnology . Aug2023, Vol. 41 Issue 8, p1099-1106. 8p. - Publication Year :
- 2023
-
Abstract
- Deep-learning language models have shown promise in various biotechnological applications, including protein design and engineering. Here we describe ProGen, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. The model was trained on 280 million protein sequences from >19,000 families and is augmented with control tags specifying protein properties. ProGen can be further fine-tuned to curated sequences and tags to improve controllable generation performance of proteins from families with sufficient homologous samples. Artificial proteins fine-tuned to five distinct lysozyme families showed similar catalytic efficiencies as natural lysozymes, with sequence identity to natural proteins as low as 31.4%. ProGen is readily adapted to diverse protein families, as we demonstrate with chorismate mutase and malate dehydrogenase. A generative deep-learning model designs artificial proteins with desired enzymatic activities. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 10870156
- Volume :
- 41
- Issue :
- 8
- Database :
- Academic Search Index
- Journal :
- Nature Biotechnology
- Publication Type :
- Academic Journal
- Accession number :
- 169912217
- Full Text :
- https://doi.org/10.1038/s41587-022-01618-2