Back to Search Start Over

Reinspection of a Clinical Proteomics Tumor Analysis Consortium (CPTAC) Dataset with Cloud Computing Reveals Abundant Post-Translational Modifications and Protein Sequence Variants

Authors :
Michael F. Moran
Keira E. Mahoney
Yuting Yuan
Shashwati Parihari
Jeffrey Shabanowitz
Manu Varkey
O. John Semmes
Trevor Glaros
Abhay Moghekar
Joseph J. Otto
Akhilesh Pandey
Satya Saxena
Young Ah Goo
Joel R. Steele
Yassene Mohammed
Dong Gi Mun
Amol Prakash
Benjamin C. Orsburn
Lorne Taylor
Anil K. Madugundu
Sanjeeva Srivastava
Nate Hoxie
Scott Peterman
Julius O. Nyalwidhe
Pouya Faridi
Source :
Cancers, Volume 13, Issue 20, Cancers, Vol 13, Iss 5034, p 5034 (2021), Cancers, 13(20). MDPI
Publication Year :
2021
Publisher :
Multidisciplinary Digital Publishing Institute, 2021.

Abstract

Simple Summary:& nbsp;We reanalyzed a publicly available breast cancer proteomics dataset consisting of 122 human tumor samples using a scalable cloud computing workflow. By doing so, we were able to search these files against millions of known human sequence variants and hundreds of common post-translational protein modifications, thereby demonstrating the power of cloud computing to address proteomic data in a true biological context. We identified thousands of relevant sequence variants and PTMs, indicating that the original studies may have only scratched the surface of the true value of the CPTAC studies completed to date. We present the results of this reanalysis in a searchable web interface for community analysis and validation.The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has provided some of the most in-depth analyses of the phenotypes of human tumors ever constructed. Today, the majority of proteomic data analysis is still performed using software housed on desktop computers which limits the number of sequence variants and post-translational modifications that can be considered. The original CPTAC studies limited the search for PTMs to only samples that were chemically enriched for those modified peptides. Similarly, the only sequence variants considered were those with strong evidence at the exon or transcript level. In this multi-institutional collaborative reanalysis, we utilized unbiased protein databases containing millions of human sequence variants in conjunction with hundreds of common post-translational modifications. Using these tools, we identified tens of thousands of high-confidence PTMs and sequence variants. We identified 4132 phosphorylated peptides in nonenriched samples, 93% of which were confirmed in the samples which were chemically enriched for phosphopeptides. In addition, our results also cover 90% of the high-confidence variants reported by the original proteogenomics study, without the need for sample specific next-generation sequencing. Finally, we report fivefold more somatic and germline variants that have an independent evidence at the peptide level, including mutations in ERRB2 and BCAS1. In this reanalysis of CPTAC proteomic data with cloud computing, we present an openly available and searchable web resource of the highest-coverage proteomic profiling of human tumors described to date.

Details

Language :
English
ISSN :
20726694
Database :
OpenAIRE
Journal :
Cancers
Accession number :
edsair.doi.dedup.....d37b34935f469fcea3d43165ef7dcf77
Full Text :
https://doi.org/10.3390/cancers13205034