1. Multi-protease Approach for the Improved Identification and Molecular Characterization of Small Proteins and Short Open Reading Frame-Encoded Peptides
- Author
-
Philipp T. Kaulich, Jürgen Bartel, Liam Cassidy, Ruth A. Schmitz, and Andreas Tholey
- Subjects
0301 basic medicine ,Proteases ,medicine.medical_treatment ,Computational biology ,Proteomics ,Biochemistry ,Open Reading Frames ,03 medical and health sciences ,Complete sequence ,Tandem Mass Spectrometry ,medicine ,Chymotrypsin ,Protease ,030102 biochemistry & molecular biology ,biology ,Chemistry ,Proteolytic enzymes ,Proteins ,General Chemistry ,Trypsin ,Open reading frame ,030104 developmental biology ,biology.protein ,Peptides ,Peptide Hydrolases ,medicine.drug - Abstract
The identification of proteins below approximately 70-100 amino acids in bottom-up proteomics is still a challenging task due to the limited number of peptides generated by proteolytic digestion. This includes the short open reading frame-encoded peptides (SEPs), which are a subset of the small proteins that were not previously annotated or that are alternatively encoded. Here, we systematically investigated the use of multiple proteases (trypsin, chymotrypsin, LysC, LysargiNase, and GluC) in GeLC-MS/MS analysis to improve the sequence coverage and the number of identified peptides for small proteins, with a focus on SEPs, in the archaeon Methanosarcina mazei. Combining the data of all proteases, we identified 63 small proteins and additional 28 SEPs with at least two unique peptides, while only 55 small proteins and 22 SEP could be identified using trypsin only. For 27 small proteins and 12 SEPs, a complete sequence coverage was achieved. Moreover, for five SEPs, incorrectly predicted translation start points or potential in vivo proteolytic processing were identified, confirming the data of a previous top-down proteomics study of this organism. The results show clearly that a multi-protease approach allows to improve the identification and molecular characterization of small proteins and SEPs. LC-MS data: ProteomeXchange PXD023921.
- Published
- 2021
- Full Text
- View/download PDF