1. Towards the characterization of the hidden world of small proteins in Staphylococcus aureus, a proteogenomics approach.
- Author
-
Fuchs, Stephan, Kucklick, Martin, Lehmann, Erik, Beckmann, Alexander, Wilkens, Maya, Kolte, Baban, Mustafayeva, Ayten, Ludwig, Tobias, Diwo, Maurice, Wissing, Josef, Jänsch, Lothar, Ahrens, Christian H., Ignatova, Zoya, and Engelmann, Susanne
- Subjects
- *
BACTERIAL genomes , *PROTEOLYSIS , *NUCLEIC acids , *BACTERIAL genes , *PROTEINS , *OPEN reading frames (Genetics) , *PEPPERS - Abstract
Small proteins play essential roles in bacterial physiology and virulence, however, automated algorithms for genome annotation are often not yet able to accurately predict the corresponding genes. The accuracy and reliability of genome annotations, particularly for small open reading frames (sORFs), can be significantly improved by integrating protein evidence from experimental approaches. Here we present a highly optimized and flexible bioinformatics workflow for bacterial proteogenomics covering all steps from (i) generation of protein databases, (ii) database searches and (iii) peptide-to-genome mapping to (iv) visualization of results. We used the workflow to identify high quality peptide spectrum matches (PSMs) for small proteins (≤ 100 aa, SP100) in Staphylococcus aureus Newman. Protein extracts from S. aureus were subjected to different experimental workflows for protein digestion and prefractionation and measured with highly sensitive mass spectrometers. In total, 175 with up to 100 aa (SP100) were identified. Out of these 24 (ranging from 9 to 99 aa) were novel and not contained in the used genome annotation.144 SP100 are highly conserved and were found in at least 50% of the publicly available S. aureus genomes, while 127 are additionally conserved in other staphylococci. Almost half of the identified SP100 were basic, suggesting a role in binding to more acidic molecules such as nucleic acids or phospholipids. Author summary: Conventional automatic genome annotation algorithms often neglect open reading frames smaller than 300 nucleotides (sORF). There are several reasons hindering automatic annotation and prediction of short genes: (i) sORFs possess insufficient sequence information for domain and homology search, (ii) only a limited number of experimentally validated sORFs can serve as templates, and (iii) sORFs show the tendency to be species-specific. We thus established a proteogenomics workflow, which is executed by two open source tools, SALT and Pepper (https://gitlab.com/s.fuchs/pepper), and uses peptide data obtained by mass spectrometry for identification of genes in bacteria that are hardly predictable by automatic annotation algorithms. As a proof of concept, we selected Staphylococcus aureus, one of the most frequently sequenced bacteria and identified 36 proteins not yet considered in the used genome annotation of S. aureus Newman. 24 there of are novel small proteins with up to 100 aa (SP100) in S. aureus Newman. This clearly demonstrates that our workflow is ideally suited to improve gene annotation of already annotated bacterial genomes. In the future, it may also facilitate protein and ORF detection in not annotated bacterial genomes. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF