Karin Lanthaler, Chris M. Grant, Victoria M. Harman, Paul F. G. Sims, Simon J. Hubbard, Robert J. Beynon, Craig Lawless, Dean E. Hammond, Rachel H. Watkins, Rebecca L. Miller, Claire E. Eyers, Philip Brownridge, and Stephen W. Holman
Defining intracellular protein concentration is critical in molecular systems biology. Although strategies for determining relative protein changes are available, defining robust absolute values in copies per cell has proven significantly more challenging. Here we present a reference data set quantifying over 1800 Saccharomyces cerevisiae proteins by direct means using protein-specific stable-isotope labeled internal standards and selected reaction monitoring (SRM) mass spectrometry, far exceeding any previous study. This was achieved by careful design of over 100 QconCAT recombinant proteins as standards, defining 1167 proteins in terms of copies per cell and upper limits on a further 668, with robust CVs routinely less than 20%. The selected reaction monitoring-derived proteome is compared with existing quantitative data sets, highlighting the disparities between methodologies. Coupled with a quantification of the transcriptome by RNA-seq taken from the same cells, these data support revised estimates of several fundamental molecular parameters: a total protein count of ∼100 million molecules-per-cell, a median of ∼1000 proteins-per-transcript, and a linear model of protein translation explaining 70% of the variance in translation rate. This work contributes a “gold-standard” reference yeast proteome (including 532 values based on high quality, dual peptide quantification) that can be widely used in systems models and for other comparative studies. Reliable and accurate quantification of the proteins present in a cell or tissue remains a major challenge for post-genome scientists. Proteins are the primary functional molecules in biological systems and knowledge of their abundance and dynamics is an important prerequisite to a complete understanding of natural physiological processes, or dysfunction in disease. Accordingly, much effort has been spent in the development of reliable, accurate and sensitive techniques to quantify the cellular proteome, the complement of proteins expressed at a given time under defined conditions (1). Moreover, the ability to model a biological system and thus characterize it in kinetic terms, requires that protein concentrations be defined in absolute numbers (2, 3). Given the high demand for accurate quantitative proteome data sets, there has been a continual drive to develop methodology to accomplish this, typically using mass spectrometry (MS) as the analytical platform. Many recent studies have highlighted the capabilities of MS to provide good coverage of the proteome at high sensitivity often using yeast as a demonstrator system (4⇓⇓⇓⇓⇓–10), suggesting that quantitative proteomics has now “come of age” (1). However, given that MS is not inherently quantitative, most of the approaches produce relative quantitation and do not typically measure the absolute concentrations of individual molecular species by direct means. For the yeast proteome, epitope tagging studies using green fluorescent protein or tandem affinity purification tags provides an alternative to MS. Here, collections of modified strains are generated that incorporate a detectable, and therefore quantifiable, tag that supports immunoblotting or fluorescence techniques (11, 12). However, such strategies for copies per cell (cpc) quantification rely on genetic manipulation of the host organism and hence do not quantify endogenous, unmodified protein. Similarly, the tagging can alter protein levels - in some instances hindering protein expression completely (11). Even so, epitope tagging methods have been of value to the community, yielding high coverage quantitative data sets for the majority of the yeast proteome (11, 12). MS-based methods do not rely on such nonendogenous labels, and can reach genome-wide levels of coverage. Accurate estimation of absolute concentrations i.e. protein copy number per cell, also usually necessitates the use of (one or more) external or internal standards from which to derive absolute abundance (4). Examples include a comprehensive quantification of the Leptospira interrogans proteome that used a 19 protein subset quantified using selected reaction monitoring (SRM)1 to calibrate their label-free data (8, 13). It is worth noting that epitope tagging methods, although also absolute, rely on a very limited set of standards for the quantitative western blots and necessitate incorporation of a suitable immunogenic tag (11). Other recent, innovative approaches exploiting total ion signal and internal scaling to estimate protein cellular abundance (10, 14), avoid the use of internal standards, though they do rely on targeted proteomic data to validate their approach. The use of targeted SRM strategies to derive proteomic calibration standards highlights its advantages in comparison to label-free in terms of accuracy, precision, dynamic range and limit of detection and has gained currency for its reliability and sensitivity (3, 15⇓–17). Indeed, SRM is often referred to as the “gold standard proteomic quantification method,” being particularly well-suited when the proteins to be quantified are known, when appropriate surrogate peptides for protein quantification can be selected a priori, and matched with stable isotope-labeled (SIL) standards (18⇓–20). In combination with SIL peptide standards that can be generated through a variety of means (3, 15), SRM can be used to quantify low copy number proteins, reaching down to ∼50 cpc in yeast (5). However, although SRM methodology has been used extensively for S. cerevisiae protein quantification by us and others (19, 21, 22), it has not been used for large protein cohorts because of the requirement to generate the large numbers of attendant SIL peptide standards; the largest published data set is only for a few tens of proteins. It remains a challenge therefore to robustly quantify an entire eukaryotic proteome in absolute terms by direct means using targeted MS and this is the focus of our present study, the Census Of the Proteome of Yeast (CoPY). We present here direct and absolute quantification of nearly 2000 endogenous proteins from S. cerevisiae grown in steady state in a chemostat culture, using the SRM-based QconCAT approach. Although arguably not quantification of the entire proteome, this represents an accurate and rigorous collection of direct yeast protein quantifications, providing a gold-standard data set of endogenous protein levels for future reference and comparative studies. The highly reproducible SIL-SRM MS data, with robust CVs typically less than 20%, is compared with other extant data sets that were obtained via alternative analytical strategies. We also report a matched high quality transcriptome from the same cells using RNA-seq, which supports additional calculations including a refined estimate of the total protein content in yeast cells, and a simple linear model of translation explaining 70% of the variance between RNA and protein levels in yeast chemostat cultures. These analyses confirm the validity of our data and approach, which we believe represents a state-of-the-art absolute quantification compendium of a significant proportion of a model eukaryotic proteome.