Start Over

Protein language models learn evolutionary statistics of interacting sequence motifs.

Authors :: Zhidian Zhang
Wayment-Steele, Hannah K.
Brixi, Garyk
Haobo Wang
Kern, Dorothee
Ovchinnikov, Sergey
Source :: Proceedings of the National Academy of Sciences of the United States of America. 11/5/2024, Vol. 121 Issue 45, p1-9. 31p.
Publication Year :: 2024
Abstract: Protein language models (pLMs) have emerged as potent tools for predicting and designing protein structure and function, and the degree to which these models fundamentally understand the inherent biophysics of protein structure stands as an open question. Motivated by a finding that pLM-based structure predictors erroneously predict nonphysical structures for protein isoforms, we investigated the nature of sequence context needed for contact predictions in the pLM Evolutionary Scale Modeling (ESM-2). We demonstrate by use of a "categorical Jacobian" calculation that ESM-2 stores statistics of coevolving residues, analogously to simpler modeling approaches like Markov Random Fields and Multivariate Gaussian models. We further investigated how ESM-2 "stores" information needed to predict contacts by comparing sequence masking strategies, and found that providing local windows of sequence information allowed ESM-2 to best recover predicted contacts. This suggests that pLMs predict contacts by storing motifs of pairwise contacts. Our investigation highlights the limitations of current pLMs and underscores the importance of understanding the underlying mechanisms of these models. [ABSTRACT FROM AUTHOR]

Subjects :: *GAUSSIAN Markov random fields
*LANGUAGE models
*PROTEIN structure prediction
*PROTEIN structure
*PROTEIN models

Details

Language :: English
ISSN :: 00278424
Volume :: 121
Issue :: 45
Database :: Academic Search Index
Journal :: Proceedings of the National Academy of Sciences of the United States of America
Publication Type :: Academic Journal
Accession number :: 180666371
Full Text :: https://doi.org/10.1073/pnas.2406285121

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Protein language models learn evolutionary statistics of interacting sequence motifs.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Protein language models learn evolutionary statistics of interacting sequence motifs.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources