Back to Search
Start Over
Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification
- Source :
- Molecular & Cellular Proteomics : MCP
- Publication Year :
- 2021
- Publisher :
- Elsevier BV, 2021.
-
Abstract
- Tandem mass spectrometry (MS/MS)-based phosphoproteomics is a powerful technology for global phosphorylation analysis. However, applying four computational pipelines to a typical mass spectrometry (MS)-based phosphoproteomic dataset from a human cancer study, we observed a large discrepancy among the reported phosphopeptide identification and phosphosite localization results, underscoring a critical need for benchmarking. While efforts have been made to compare performance of computational pipelines using data from synthetic phosphopeptides, evaluations involving real application data have been largely limited to comparing the numbers of phosphopeptide identifications due to the lack of appropriate evaluation metrics. We investigated three deep-learning-derived features as potential evaluation metrics: phosphosite probability, Delta RT, and spectral similarity. Predicted phosphosite probability is computed by MusiteDeep, which provides high accuracy as previously reported; Delta RT is defined as the absolute retention time (RT) difference between RTs observed and predicted by AutoRT; and spectral similarity is defined as the Pearson’s correlation coefficient between spectra observed and predicted by pDeep2. Using a synthetic peptide dataset, we found that both Delta RT and spectral similarity provided excellent discrimination between correct and incorrect peptide-spectrum matches (PSMs) both when incorrect PSMs involved wrong peptide sequences and even when incorrect PSMs were caused by only incorrect phosphosite localization. Based on these results, we used all the three deep-learning-derived features as evaluation metrics to compare different computational pipelines on diverse set of phosphoproteomic datasets and showed their utility in benchmarking performance of the pipelines. The benchmark metrics demonstrated in this study will enable users to select computational pipelines and parameters for routine analysis of phosphoproteomics data and will offer guidance for developers to improve computational methods.<br />Graphical Abstract<br />Highlights • Computational method selection substantially affects phosphopeptide identification. • Deep-learning-derived metrics effectively discriminate correct and incorrect PSMs. • Novel metrics enable computational method comparison on real application data.<br />In Brief Tandem mass spectrometry (MS/MS)-based phosphoproteomics is a powerful technology for global phosphorylation analysis. However, applying different computational pipelines to the same dataset may produce substantially different phosphopeptide identification results, underscoring a critical need for benchmarking. We present three deep-learning-derived benchmark metrics. The benchmark metrics demonstrated in this study will enable users to select computational pipelines and parameters for routine analysis of phosphoproteomics data and will offer guidance for developers to improve computational methods.
- Subjects :
- Phosphopeptides
Proteomics
RT, retention time
False discovery rate
MS/MS, tandem mass spectrometry
SPC, Spearman’s correlation coefficient
FDR, false discovery rate
Computer science
NL, neutral loss
CDAP, CPTAC common data analysis pipeline
Benchmark
PCC, Pearson’s correlation coefficient
Tandem mass tag
computer.software_genre
NCE, normalized collision energy
Cell Line
Set (abstract data type)
Mice
Deep Learning
COS, cosine similarity
Animals
Humans
Phosphorylation
phosphopeptide identification
SA, spectral angle
LC, liquid chromatography
Research
Cosine similarity
CPTAC, Clinical Proteomic Tumor Analysis Consortium
Phosphoproteomics
phosphoproteomics
TMT, tandem mass tag
General Medicine
Benchmarking
PSM, peptide-spectrum match
ICPC, International Cancer Proteogenome Consortium
AUROC, area under the receiver operating characteristics
Identification (information)
UCEC, uterine corpus endometrial carcinoma
ComputingMethodologies_PATTERNRECOGNITION
MAE, median absolute error
KDT, Kendal rank correlation coefficient
Benchmark (computing)
PTM, posttranslational modification
Data mining
computer
Subjects
Details
- ISSN :
- 15359476
- Volume :
- 20
- Database :
- OpenAIRE
- Journal :
- Molecular & Cellular Proteomics
- Accession number :
- edsair.doi.dedup.....88eefca1ff678373652dbb5148a2adc1