Back to Search Start Over

TemStaPro: protein thermostability prediction using sequence representations from protein language models.

Authors :
Pudžiuvelytė I
Olechnovič K
Godliauskaite E
Sermokas K
Urbaitis T
Gasiunas G
Kazlauskas D
Source :
Bioinformatics (Oxford, England) [Bioinformatics] 2024 Mar 29; Vol. 40 (4).
Publication Year :
2024

Abstract

Motivation: Reliable prediction of protein thermostability from its sequence is valuable for both academic and industrial research. This prediction problem can be tackled using machine learning and by taking advantage of the recent blossoming of deep learning methods for sequence analysis. These methods can facilitate training on more data and, possibly, enable the development of more versatile thermostability predictors for multiple ranges of temperatures.<br />Results: We applied the principle of transfer learning to predict protein thermostability using embeddings generated by protein language models (pLMs) from an input protein sequence. We used large pLMs that were pre-trained on hundreds of millions of known sequences. The embeddings from such models allowed us to efficiently train and validate a high-performing prediction method using over one million sequences that we collected from organisms with annotated growth temperatures. Our method, TemStaPro (Temperatures of Stability for Proteins), was used to predict thermostability of CRISPR-Cas Class II effector proteins (C2EPs). Predictions indicated sharp differences among groups of C2EPs in terms of thermostability and were largely in tune with previously published and our newly obtained experimental data.<br />Availability and Implementation: TemStaPro software and the related data are freely available from https://github.com/ievapudz/TemStaPro and https://doi.org/10.5281/zenodo.7743637.<br /> (© The Author(s) 2024. Published by Oxford University Press.)

Details

Language :
English
ISSN :
1367-4811
Volume :
40
Issue :
4
Database :
MEDLINE
Journal :
Bioinformatics (Oxford, England)
Publication Type :
Academic Journal
Accession number :
38507682
Full Text :
https://doi.org/10.1093/bioinformatics/btae157