Back to Search Start Over

Machine learning reveals sequence-function relationships in family 7 glycoside hydrolases

Authors :
Gregg T. Beckham
Brent Harrison
Japheth E. Gado
Mats Sandgren
Christina M. Payne
Jerry Ståhlberg
Source :
The Journal of Biological Chemistry
Publication Year :
2021
Publisher :
Elsevier BV, 2021.

Abstract

Family 7 glycoside hydrolases (GH7) are among the principal enzymes for cellulose degradation in nature and industrially. These important enzymes are often bimodular, comprised of a catalytic domain attached to a carbohydrate binding module (CBM) via a flexible linker, and exhibit a long active site that binds cello-oligomers of up to ten glucosyl moieties. GH7 cellulases consist of two major subtypes: cellobiohydrolases (CBH) and endoglucanases (EG). Despite the critical biological and industrial importance of GH7 enzymes, there remain gaps in our understanding of how GH7 sequence and structure relate to function. Here, we employed machine learning to gain insights into relationships between sequence, structure, and function across the GH7 family. Machine-learning models, using the number of residues in the active-site loops as features, were able discriminate GH7 CBHs and EGs with up to 99% accuracy. The lengths of the A4, B2, B3, and B4 loops were strongly correlated with functional subtype across the GH7 family. Position-specific classification rules were derived such that specific amino acids at 42 different sequence positions predicted the functional subtype with accuracies greater than 87%. A random forest model trained on residues at 19 positions in the catalytic domain predicted the presence of a CBM with 89.5% accuracy. We propose these positions play vital roles in the functional variation of GH7 cellulases. Taken together, our results complement numerous experimental findings and present functional relationships that can be applied when prospecting GH7 cellulases from nature, for sequence annotation, and to understand or manipulate function.

Details

ISSN :
00219258
Volume :
297
Database :
OpenAIRE
Journal :
Journal of Biological Chemistry
Accession number :
edsair.doi.dedup.....78bd51e315c2a523d356bdbb0e8d6356