1. Integrating signal peptide- and protein structure-prediction methods to better determine protein subcellular localization
- Author
-
Sanaboyana, Venkata Ramana
- Subjects
Statistics ,FOS: Mathematics - Abstract
This thesis presents work aimed at improving the accuracy of sequence-based signal peptide prediction methods by incorporating structural information. Although sequence-based methods have become increasingly accurate, even state-of-the-art predictors commonly produce false positive signal peptide predictions. Since the presence or absence of a signal peptide is crucial in determining a protein’s subcellular localization, accuracy in identifying these signals is essential. My work attempts to improve the accuracy of subcellular localization by building structure-based approaches to refine signal peptide predictions. The thesis first describes the development of a refinement method that specifically focuses on signal peptides predicted in the Escherichia coli proteome. The refinement approach follows from the observation that the region corresponding to an incorrectly predicted signal peptide generally associates with the body of the protein in the protein structure. Predicted structural information from the trRosetta web-server was incorporated in the form of inter-residue distance (contact) maps, which could be used to gauge the extent of interaction between potential signal peptides and the protein body. The classification model built based on trRosetta contact maps proved to be highly accurate in correctly classifying true false positive predicted signal peptides., Focusing next on the extension of the approach to other organisms, the thesis presents a more general approach for improving signal peptide prediction methods that takes advantage of the highly accurate protein structures predicted by AlphaFold2. It targets the most recent signal peptide prediction methods for both the prokaryotes and eukaryotes. Eventually, the method was applied to probe false positive signal peptides predicted from the proteomes of forty-eight organisms whose protein structures are directly available in the AlpahFold2 structural database., Finally, the thesis discusses the application of predicted structures of protein complexes in determining the specificity of enzymes towards predicted signal peptides. This chapter highlights several applications of the method, including the prediction of cleavage sites as well as the determination of enzyme specificity. Overall, the study demonstrates that analysis of a protein’s structure can be a powerful tool in predicting its subcellular localization.
- Published
- 2024
- Full Text
- View/download PDF