1. Unlocking protein sequences : Advances in protein structure and ligand-binding site prediction
- Author
-
Shenoy, Aditi and Shenoy, Aditi
- Abstract
The protein sequence determines how it will fold into its unique three-dimensional structure. Once folded, proteins perform their functions by interacting with other proteins or molecules called ligands within the cell. Experimental determination of protein structure and function is tedious. Computational approaches aim to accurately predict the properties of proteins to complement experimental efforts of understanding biochemical mechanisms within the cell. This thesis introduces computational techniques that predict the structure of protein complexes and identify protein residues involved in interactions with common biomolecules, such as metal ions and nucleic acids, based on sequence information. AlphaFold, a method that predicted protein structure using sequence information with almost experimental accuracy, was a critical breakthrough that shaped the field of protein structure prediction. Subsequently, approaches such as FoldDock adapted the AlphaFold pipeline for dimer complexes. Paper I applies the FoldDock protocol to understand toxin-antitoxin systems. These protein complexes are highly evolutionary conserved, and high-confidence dimer predictions were generated. Paper II applies the FoldDock protocol to study protein-protein interactions in the human proteome. To verify the reliability of machine-learning-based computational methods, they must be tested on independent data different from the data used to train the method. Paper III involves generating and using a homology-reduced independent test set to benchmark the performance of protein complex structure predictors, including the recent AlphaFold release adapted for multi-chain proteins – AlphaFold-Multimer. A confidence score (pDockQ2) was proposed to estimate the quality of the interfaces within multimers. Paper I, Paper II and Paper III are associated with predicting and evaluating protein-protein interactions. Representation learning involves finding effective representations of input data to maxi
- Published
- 2024