1. DART-ID increases single-cell proteome coverage
- Author
-
Albert Tian Chen, Nikolai Slavov, Alexander Franks, and Cox, Jürgen
- Subjects
0301 basic medicine ,Proteome ,Proteomes ,Cell ,Peptide ,Proteomics ,Tandem mass spectrometry ,01 natural sciences ,Biochemistry ,Monocytes ,Mathematical Sciences ,Database and Informatics Methods ,White Blood Cells ,0302 clinical medicine ,Single-cell analysis ,Animal Cells ,Tandem Mass Spectrometry ,Medicine and Health Sciences ,Biology (General) ,Database Searching ,computer.programming_language ,chemistry.chemical_classification ,0303 health sciences ,Chromatography ,Liquid ,Ecology ,T Cells ,Experimental Design ,Biological Sciences ,medicine.anatomical_structure ,Data point ,Mental Health ,Computational Theory and Mathematics ,Research Design ,Modeling and Simulation ,Information Retrieval ,Physical Sciences ,Cellular Types ,Single-Cell Analysis ,Sequence Analysis ,Research Article ,QH301-705.5 ,Bioinformatics ,Immune Cells ,Immunology ,Genomics ,Sequence alignment ,Computational biology ,Biology ,Mass spectrometry ,Research and Analysis Methods ,Statistical power ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Information and Computing Sciences ,medicine ,Genetics ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Dart ,Blood Cells ,010401 analytical chemistry ,Biology and Life Sciences ,Proteins ,Bayes Theorem ,Cell Biology ,Missing data ,Probability Theory ,Probability Distribution ,0104 chemical sciences ,030104 developmental biology ,chemistry ,Generic health relevance ,computer ,Sequence Alignment ,human activities ,030217 neurology & neurosurgery ,Mathematics ,Chromatography, Liquid - Abstract
Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). DART-ID implements principled Bayesian frameworks for global retention time (RT) alignment and for incorporating RT estimates towards improved confidence estimates of peptide-spectrum-matches. When applied to bulk or to single-cell samples, DART-ID increased the number of data points by 30–50% at 1% FDR, and thus decreased missing data. Benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for quantitative analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at http://dart-id.slavovlab.net., Author summary Identifying and quantifying proteins in single cells gives researchers the ability to tackle complex biological problems that involve single cell heterogeneity, such as the treatment of solid tumors. Mass spectrometry analysis of peptides can identify their sequence from their masses and the masses of their fragment ions, but often times these pieces of evidence are insufficient for a confident peptide identification. This problem is exacerbated when analyzing lowly abundant samples such as single cells. To identify even peptides with weak mass spectra, DART-ID incorporates their retention time—the time when they elute from the liquid chromatography used to physically separate them. We present both a novel method of aligning the retention times of peptides across experiments, as well as a rigorous framework for using the estimated retention times to enhance peptide sequence identification. Incorporating the retention time as additional evidence leads to a substantial increase in the number of samples in which proteins are confidently identified and quantified.
- Published
- 2019