1. Improved prediction of peptide detectability for targeted proteomics using a rank-based algorithm and organism-specific data
- Author
-
Konrad Basler, Christian H. Ahrens, Bernd Wollscheid, Daniel J. Stekhoven, Juerg E. Frey, Ermir Qeli, Erich Brunner, Ulrich Omasits, Sandra Goetze, University of Zurich, and Omasits, Ulrich
- Subjects
SRM ,Saccharomyces cerevisiae Proteins ,1303 Biochemistry ,Systems biology ,In silico ,Biophysics ,Saccharomyces cerevisiae ,Biology ,Machine learning ,computer.software_genre ,Proteomics ,Biochemistry ,law.invention ,Ranking (information retrieval) ,03 medical and health sciences ,Bacterial Proteins ,PageRank ,Sequence Analysis, Protein ,law ,Animals ,Drosophila Proteins ,Databases, Protein ,Shotgun proteomics ,Rank prediction algorithms ,Targeted proteomics ,030304 developmental biology ,0303 health sciences ,Bartonella henselae ,business.industry ,030302 biochemistry & molecular biology ,10124 Institute of Molecular Life Sciences ,Drosophila melanogaster ,Proteome ,570 Life sciences ,biology ,Learning to rank ,Artificial intelligence ,Leptospira interrogans ,Peptides ,business ,Proteotypic peptides ,computer ,Algorithm ,Peptide detectability ,Algorithms ,1304 Biophysics - Abstract
The in silico prediction of the best-observable “proteotypic” peptides in mass spectrometry-based workflows is a challenging problem. Being able to accurately predict such peptides would enable the informed selection of proteotypic peptides for targeted quantification of previously observed and non-observed proteins for any organism, with a significant impact for clinical proteomics and systems biology studies. Current prediction algorithms rely on physicochemical parameters in combination with positive and negative training sets to identify those peptide properties that most profoundly affect their general detectability. Here we present PeptideRank, an approach that uses learning to rank algorithm for peptide detectability prediction from shotgun proteomics data, and that eliminates the need to select a negative dataset for the training step. A large number of different peptide properties are used to train ranking models in order to predict a ranking of the best-observable peptides within a protein. Empirical evaluation with rank accuracy metrics showed that PeptideRank complements existing prediction algorithms. Our results indicate that the best performance is achieved when it is trained on organism-specific shotgun proteomics data, and that PeptideRank is most accurate for short to medium-sized and abundant proteins, without any loss in prediction accuracy for the important class of membrane proteins. Biological significance Targeted proteomics approaches have been gaining a lot of momentum and hold immense potential for systems biology studies and clinical proteomics. However, since only very few complete proteomes have been reported to date, for a considerable fraction of a proteome there is no experimental proteomics evidence that would allow to guide the selection of the best-suited proteotypic peptides (PTPs), i.e. peptides that are specific to a given proteoform and that are repeatedly observed in a mass spectrometer. We describe a novel, rank-based approach for the prediction of the best-suited PTPs for targeted proteomics applications. By building on methods developed in the field of information retrieval (e.g. web search engines like Google's PageRank), we circumvent the delicate step of selecting positive and negative training sets and at the same time also more closely reflect the experimentalist´s need for selecting e.g. the 5 most promising peptides for targeting a protein of interest. This approach allows to predict PTPs for not yet observed proteins or for organisms without prior experimental proteomics data such as many non-model organisms.
- Published
- 2014
- Full Text
- View/download PDF