Back to Search
Start Over
Machine learning reveals sequence-function relationships in family 7 glycoside hydrolases
- Source :
- The Journal of Biological Chemistry
- Publication Year :
- 2021
- Publisher :
- Elsevier BV, 2021.
-
Abstract
- Family 7 glycoside hydrolases (GH7) are among the principal enzymes for cellulose degradation in nature and industrially. These important enzymes are often bimodular, comprised of a catalytic domain attached to a carbohydrate binding module (CBM) via a flexible linker, and exhibit a long active site that binds cello-oligomers of up to ten glucosyl moieties. GH7 cellulases consist of two major subtypes: cellobiohydrolases (CBH) and endoglucanases (EG). Despite the critical biological and industrial importance of GH7 enzymes, there remain gaps in our understanding of how GH7 sequence and structure relate to function. Here, we employed machine learning to gain insights into relationships between sequence, structure, and function across the GH7 family. Machine-learning models, using the number of residues in the active-site loops as features, were able discriminate GH7 CBHs and EGs with up to 99% accuracy. The lengths of the A4, B2, B3, and B4 loops were strongly correlated with functional subtype across the GH7 family. Position-specific classification rules were derived such that specific amino acids at 42 different sequence positions predicted the functional subtype with accuracies greater than 87%. A random forest model trained on residues at 19 positions in the catalytic domain predicted the presence of a CBM with 89.5% accuracy. We propose these positions play vital roles in the functional variation of GH7 cellulases. Taken together, our results complement numerous experimental findings and present functional relationships that can be applied when prospecting GH7 cellulases from nature, for sequence annotation, and to understand or manipulate function.
- Subjects :
- Glycoside Hydrolases
Trichoderma reesei
Cellulase
Molecular Dynamics Simulation
Biology
Machine learning
computer.software_genre
CBH, cellobiohydrolase
KNN, k-nearest neighbor
Biochemistry
k-nearest neighbors algorithm
Machine Learning
GH, glycoside hydrolase
Catalytic Domain
CBM, carbohydrate-binding module
tryptophan
glycoside hydrolase
Glycoside hydrolase
ML, machine learning
Cellulose
Molecular Biology
Sequence (medicine)
chemistry.chemical_classification
cellulase
Multiple sequence alignment
Chemistry
business.industry
Biochemistry and Molecular Biology
CD, catalytic domain
Active site
bioinformatics
Cell Biology
EG, endoglucanase
Amino acid
MSA, multiple sequence alignment
Kinetics
statistics
HMM, hidden Markov model
LPMO, lytic polysaccharide monooxygenase
GH7, family 7 glycoside hydrolase
biology.protein
Artificial intelligence
Carbohydrate-binding module
business
Linker
computer
Function (biology)
Research Article
Subjects
Details
- ISSN :
- 00219258
- Volume :
- 297
- Database :
- OpenAIRE
- Journal :
- Journal of Biological Chemistry
- Accession number :
- edsair.doi.dedup.....78bd51e315c2a523d356bdbb0e8d6356