1. From chemoproteomic-detected amino acids to genomic coordinates: insights into precise multi-omic data integration
- Author
-
Heta S Desai, Keriann M. Backus, Maria F. Palafox, and Valerie A. Arboleda
- Subjects
Prioritization ,Models, Molecular ,Proteomics ,Medicine (General) ,QH301-705.5 ,Druggability ,Computational biology ,Biology ,computer.software_genre ,General Biochemistry, Genetics and Molecular Biology ,Article ,Cell Line ,03 medical and health sciences ,Jurkat Cells ,0302 clinical medicine ,R5-920 ,Databases, Genetic ,multi‐omics ,Humans ,Chemoproteomics ,Biology (General) ,Amino Acids ,Genetic Association Studies ,030304 developmental biology ,chemistry.chemical_classification ,inter‐database mapping ,0303 health sciences ,amino acid reactivity ,General Immunology and Microbiology ,Applied Mathematics ,Computational Biology ,Genetic Variation ,Genomics ,Articles ,Pathogenicity ,Cysteine protease ,chemoproteomics ,Amino acid ,Computational Theory and Mathematics ,chemistry ,genetic pathogenicity prediction ,General Agricultural and Biological Sciences ,computer ,030217 neurology & neurosurgery ,Information Systems ,Cysteine ,Data integration - Abstract
The integration of proteomic, transcriptomic, and genetic variant annotation data will improve our understanding of genotype–phenotype associations. Due, in part, to challenges associated with accurate inter‐database mapping, such multi‐omic studies have not extended to chemoproteomics, a method that measures the intrinsic reactivity and potential “druggability” of nucleophilic amino acid side chains. Here, we evaluated mapping approaches to match chemoproteomic‐detected cysteine and lysine residues with their genetic coordinates. Our analysis revealed that database update cycles and reliance on stable identifiers can lead to pervasive misidentification of labeled residues. Enabled by this examination of mapping strategies, we then integrated our chemoproteomics data with computational methods for predicting genetic variant pathogenicity, which revealed that codons of highly reactive cysteines are enriched for genetic variants that are predicted to be more deleterious and allowed us to identify and functionally characterize a new damaging residue in the cysteine protease caspase‐8. Our study provides a roadmap for more precise inter‐database mapping and points to untapped opportunities to improve the predictive power of pathogenicity scores and to advance prioritization of putative druggable sites., Multi‐omic data integration maps Chemoproteomic Detected (CpD) amino acids to genomic‐level predictions of variant pathogenicity. Highly reactive cysteine and lysine residues are enriched for high pathogenicity (CADD) scores and disease‐causing pathogenic variants.
- Published
- 2020