1. Proteome Coverage Prediction for Integrated Proteomics Datasets
- Author
-
Ruedi Aebersold, Joachim M. Buhmann, and Manfred Claassen
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,Complex protein ,GeneralLiterature_INTRODUCTORYANDSURVEY ,Computer science ,Prediction methods ,Proteome ,Redundancy (engineering) ,Bacterium L ,Computational biology ,Proteomics ,Mass spectrometric - Abstract
Comprehensive characterization of a proteome defines a fundamental goal in proteomics In order to maximize proteome coverage for a complex protein mixture, i.e to identify as many proteins as possible, various different fractionation experiments are typically performed and the individual fractions are subjected to mass spectrometric analysis The resulting data are integrated into large and heterogeneous datasets Proteome coverage prediction refers to the task of extrapolating the number of protein discoveries by future measurements conditioned on a sequence of already performed measurements Proteome coverage prediction at an early stage enables experimentalists to design and plan efficient proteomics studies To date, there does not exist any method that reliably predicts proteome coverage from integrated datasets We present a generalized hierarchical Pitman-Yor process model that explicitly captures the redundancy within integrated datasets We assess the proteome coverage prediction accuracy of our approach applied to an integrated proteomics dataset for the bacterium L interrogans and we demonstrate that it outperforms ad hoc extrapolation methods and prediction methods designed for non-integrated datasets Furthermore, we estimate the maximally achievable proteome coverage for the experimental setup underlying the L interrogans dataset We discuss the implications of our results to determine rational stop criteria and their influence on the design of efficient and reliable proteomics studies.
- Published
- 2010
- Full Text
- View/download PDF