Back to Search
Start Over
TemStaPro: protein thermostability prediction using sequence representations from protein language models.
- Source :
-
Bioinformatics (Oxford, England) [Bioinformatics] 2024 Mar 29; Vol. 40 (4). - Publication Year :
- 2024
-
Abstract
- Motivation: Reliable prediction of protein thermostability from its sequence is valuable for both academic and industrial research. This prediction problem can be tackled using machine learning and by taking advantage of the recent blossoming of deep learning methods for sequence analysis. These methods can facilitate training on more data and, possibly, enable the development of more versatile thermostability predictors for multiple ranges of temperatures.<br />Results: We applied the principle of transfer learning to predict protein thermostability using embeddings generated by protein language models (pLMs) from an input protein sequence. We used large pLMs that were pre-trained on hundreds of millions of known sequences. The embeddings from such models allowed us to efficiently train and validate a high-performing prediction method using over one million sequences that we collected from organisms with annotated growth temperatures. Our method, TemStaPro (Temperatures of Stability for Proteins), was used to predict thermostability of CRISPR-Cas Class II effector proteins (C2EPs). Predictions indicated sharp differences among groups of C2EPs in terms of thermostability and were largely in tune with previously published and our newly obtained experimental data.<br />Availability and Implementation: TemStaPro software and the related data are freely available from https://github.com/ievapudz/TemStaPro and https://doi.org/10.5281/zenodo.7743637.<br /> (© The Author(s) 2024. Published by Oxford University Press.)
- Subjects :
- Software
Amino Acid Sequence
Language
Proteins metabolism
Machine Learning
Subjects
Details
- Language :
- English
- ISSN :
- 1367-4811
- Volume :
- 40
- Issue :
- 4
- Database :
- MEDLINE
- Journal :
- Bioinformatics (Oxford, England)
- Publication Type :
- Academic Journal
- Accession number :
- 38507682
- Full Text :
- https://doi.org/10.1093/bioinformatics/btae157