1. Precision Neoantigen Discovery Using Large-Scale Immunopeptidomes and Composite Modeling of MHC Peptide Presentation
- Author
-
Simo V. Zhang, Gabor Bartha, Rachel Marty Pyke, Richard Chen, Sean Michael Boyle, Jason Harris, John A. West, Michael Snyder, Sejal Desai, Rena McClory, Charles Abbott, Dattatreya Mellacheruvu, Nick A. Phillips, and Steven Dea
- Subjects
NMDP, National Marrow Donor Program ,False discovery rate ,Proteome ,Computer science ,Biochemistry ,Immunoproteomics ,Epitope ,Analytical Chemistry ,immunology ,Major Histocompatibility Complex ,SHERPA, Systematic HLA Epitope Ranking Pan Algorithm ,Immune Epitope Database and Analysis Resource ,GFP, green fluorescent protein ,next generation sequencing ,Antigen Presentation ,0303 health sciences ,biology ,Antigen processing ,030302 biochemistry & molecular biology ,Technological Innovation and Resources ,immunopeptidomics ,G, gene propensity (model feature) ,ELISA, enzyme-linked immunosorbent assay ,B, binding pocket (model feature) ,machine learning ,P, peptide (model feature) ,H, hotspot score (model feature) ,F, flanking regions (model feature) ,cancer vaccines ,Algorithms ,L, peptide length (model feature) ,FDR, false discovery rate ,T, protein abundance as measured by TPM (model feature) ,Decision tree ,Computational biology ,Human leukocyte antigen ,Major histocompatibility complex ,Cell Line ,03 medical and health sciences ,TPM, transcripts per million ,Special Issue: Immunopeptidomics ,Antigens, Neoplasm ,cancer ,MHC, major histocompatibility complex ,Humans ,Gene ,Molecular Biology ,030304 developmental biology ,ATCC, American Type Culture Collection ,IEDB, Immune Epitope Database and Analysis Resource ,pMHC, major histocompatibility complex-peptide ,HLA, human leukocyte antigen ,Research ,Models, Theoretical ,neoantigen prediction ,IMGT, International ImMunoGeneTics Information System ,LOO, leave one out model ,biology.protein ,Gradient boosting ,MHC ,LC-MS/MS, liquid chromatography with tandem mass spectrometry ,Peptides ,Transcriptome ,P-models, primary models ,neoantigens - Abstract
Major histocompatibility complex (MHC)-bound peptides that originate from tumor-specific genetic alterations, known as neoantigens, are an important class of anticancer therapeutic targets. Accurately predicting peptide presentation by MHC complexes is a key aspect of discovering therapeutically relevant neoantigens. Technological improvements in mass-spectrometry-based immunopeptidomics and advanced modeling techniques have vastly improved MHC presentation prediction over the past two decades. However, improvement in the sensitivity and specificity of prediction algorithms is needed for clinical applications such as the development of personalized cancer vaccines, the discovery of biomarkers for response to checkpoint blockade, and the quantification of autoimmune risk in gene therapies. Toward this end, we generated allele-specific immunopeptidomics data using 25 monoallelic cell lines and created Systematic HLA Epitope Ranking Pan Algorithm (SHERPA), a pan-allelic MHC-peptide algorithm for predicting MHC-peptide binding and presentation. In contrast to previously published large-scale monoallelic data, we used an HLA-null K562 parental cell line and a stable transfection of HLA alleles to better emulate native presentation. Our dataset includes five previously unprofiled alleles that expand MHC-binding pocket diversity in the training data and extend allelic coverage in under profiled populations. To improve generalizability, SHERPA systematically integrates 128 monoallelic and 384 multiallelic samples with publicly available immunoproteomics data and binding assay data. Using this dataset, we developed two features that empirically estimate the propensities of genes and specific regions within gene bodies to engender immunopeptides to represent antigen processing. Using a composite model constructed with gradient boosting decision trees, multiallelic deconvolution, and 2.15 million peptides encompassing 167 alleles, we achieved a 1.44-fold improvement of positive predictive value compared with existing tools when evaluated on independent monoallelic datasets and a 1.15-fold improvement when evaluating on tumor samples. With a high degree of accuracy, SHERPA has the potential to enable precision neoantigen discovery for future clinical applications., Graphical Abstract, Highlights • Generated 25 stably transfected monoallelic cell lines and applied immunopeptidomics. • Harmonized 512 public immunopeptidomic samples through systematic reprocessing. • Developed pan-allele MHC-binding algorithm (SHERPA) utilizing 167 human HLA alleles. • SHERPA demonstrates up to 1.44-fold increased precision over competing algorithms., In Brief Accurately identifying neoantigens is critical for many clinical applications. We generated immunopeptidomics data from 25 stably transfected monoallelic cell lines. Then, we systematically reprocessed a large corpus of public data to improve major histocompatibility complex (MHC) binding pocket diversity and to empirically learn the rules of antigen presentation. In applying these datasets, we trained SHERPA, an MHC binding and presentation prediction algorithm. SHERPA improves performance compared with existing tools by 1.44-fold in held-out monoallelic data and 1.11-fold for immunogenic epitopes.
- Published
- 2023
- Full Text
- View/download PDF