25 results on '"Kymberleigh A. Pagel"'
Search Results
2. Calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for PP3/BP4 criteria
- Author
-
Vikas Pejaver, Alicia B. Byrne, Bing-Jian Feng, Kymberleigh A. Pagel, Sean D. Mooney, Rachel Karchin, Anne O’Donnell-Luria, Steven M. Harrison, Sean V. Tavtigian, Marc S. Greenblatt, Leslie G. Biesecker, Predrag Radivojac, Steven E. Brenner, Ahmad A. Tayoun, Jonathan S. Berg, Garry R. Cutting, Sian Ellard, Peter Kang, Izabela Karbassi, Jessica Mester, Tina Pesaran, Sharon E. Plon, Heidi L. Rehm, Natasha T. Strande, and Scott Topper
- Subjects
Consensus ,Virulence ,Calibration ,Genetics ,Humans ,Educational Status ,Genetics (clinical) - Abstract
Recommendations from the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) for interpreting sequence variants specify the use of computational predictors as "supporting" level of evidence for pathogenicity or benignity using criteria PP3 and BP4, respectively. However, score intervals defined by tool developers, and ACMG/AMP recommendations that require the consensus of multiple predictors, lack quantitative support. Previously, we described a probabilistic framework that quantified the strengths of evidence (supporting, moderate, strong, very strong) within ACMG/AMP recommendations. We have extended this framework to computational predictors and introduce a new standard that converts a tool's scores to PP3 and BP4 evidence strengths. Our approach is based on estimating the local positive predictive value and can calibrate any computational tool or other continuous-scale evidence on any variant type. We estimate thresholds (score intervals) corresponding to each strength of evidence for pathogenicity and benignity for thirteen missense variant interpretation tools, using carefully assembled independent data sets. Most tools achieved supporting evidence level for both pathogenic and benign classification using newly established thresholds. Multiple tools reached score thresholds justifying moderate and several reached strong evidence levels. One tool reached very strong evidence level for benign classification on some variants. Based on these findings, we provide recommendations for evidence-based revisions of the PP3 and BP4 ACMG/AMP criteria using individual tools and future assessment of computational methods for clinical interpretation.
- Published
- 2022
3. Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome.
- Author
-
Kymberleigh A Pagel, Danny Antaki, AoJie Lian, Matthew Mort, David N Cooper, Jonathan Sebat, Lilia M Iakoucheva, Sean D Mooney, and Predrag Radivojac
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.
- Published
- 2019
- Full Text
- View/download PDF
4. Data from High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets
- Author
-
Rachel Karchin, Kymberleigh A. Pagel, Valsamo Anagnostou, Victor E. Velculescu, Angelika B. Riemer, Maria Bonsack, Ashton Omdahl, Benjamin Kaminow, Dylan Hirsch, Lily Zheng, Collin Tokheim, I.K. Ashok Sivakumar, Justin Huang, Rohit Bhattacharya, and Xiaoshan M. Shao
- Abstract
Computational prediction of binding between neoantigen peptides and major histocompatibility complex (MHC) proteins can be used to predict patient response to cancer immunotherapy. Current neoantigen predictors focus on in silico estimation of MHC binding affinity and are limited by low predictive value for actual peptide presentation, inadequate support for rare MHC alleles, and poor scalability to high-throughput data sets. To address these limitations, we developed MHCnuggets, a deep neural network method that predicts peptide–MHC binding. MHCnuggets can predict binding for common or rare alleles of MHC class I or II with a single neural network architecture. Using a long short-term memory network (LSTM), MHCnuggets accepts peptides of variable length and is faster than other methods. When compared with methods that integrate binding affinity and MHC-bound peptide (HLAp) data from mass spectrometry, MHCnuggets yields a 4-fold increase in positive predictive value on independent HLAp data. We applied MHCnuggets to 26 cancer types in The Cancer Genome Atlas, processing 26.3 million allele–peptide comparisons in under 2.3 hours, yielding 101,326 unique predicted immunogenic missense mutations (IMM). Predicted IMM hotspots occurred in 38 genes, including 24 driver genes. Predicted IMM load was significantly associated with increased immune cell infiltration (P < 2 × 10−16), including CD8+ T cells. Only 0.16% of predicted IMMs were observed in more than 2 patients, with 61.7% of these derived from driver mutations. Thus, we describe a method for neoantigen prediction and its performance characteristics and demonstrate its utility in data sets representing multiple human cancers.
- Published
- 2023
5. Figure S1 Legend from High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets
- Author
-
Rachel Karchin, Kymberleigh A. Pagel, Valsamo Anagnostou, Victor E. Velculescu, Angelika B. Riemer, Maria Bonsack, Ashton Omdahl, Benjamin Kaminow, Dylan Hirsch, Lily Zheng, Collin Tokheim, I.K. Ashok Sivakumar, Justin Huang, Rohit Bhattacharya, and Xiaoshan M. Shao
- Abstract
Supplemental Legend for Figure S1
- Published
- 2023
6. Supplementary Table 1-9 from High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets
- Author
-
Rachel Karchin, Kymberleigh A. Pagel, Valsamo Anagnostou, Victor E. Velculescu, Angelika B. Riemer, Maria Bonsack, Ashton Omdahl, Benjamin Kaminow, Dylan Hirsch, Lily Zheng, Collin Tokheim, I.K. Ashok Sivakumar, Justin Huang, Rohit Bhattacharya, and Xiaoshan M. Shao
- Abstract
Supplementary Table 1-9
- Published
- 2023
7. Figure S1 from High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets
- Author
-
Rachel Karchin, Kymberleigh A. Pagel, Valsamo Anagnostou, Victor E. Velculescu, Angelika B. Riemer, Maria Bonsack, Ashton Omdahl, Benjamin Kaminow, Dylan Hirsch, Lily Zheng, Collin Tokheim, I.K. Ashok Sivakumar, Justin Huang, Rohit Bhattacharya, and Xiaoshan M. Shao
- Abstract
Figure S1
- Published
- 2023
8. Prioritizing de novo autism risk variants with calibrated gene- and variant-scoring models
- Author
-
Akula Bala Pramod, Yuxiang Jiang, Lilia M. Iakoucheva, Predrag Radivojac, Jorge Urresti, and Kymberleigh A. Pagel
- Subjects
Adenosine Triphosphatases ,Prioritization ,Autism Spectrum Disorder ,Sodium ,Computational biology ,Biology ,medicine.disease ,Pathogenicity ,Article ,Human genetics ,Autism spectrum disorder ,ATP1A3 ,Mutation ,Potassium ,Genetics ,medicine ,Humans ,Autism ,Missense mutation ,Genetic Predisposition to Disease ,Autistic Disorder ,Sodium-Potassium-Exchanging ATPase ,Gene ,Genetics (clinical) - Abstract
Whole-exome and whole-genome sequencing studies in autism spectrum disorder (ASD) have identified hundreds of thousands of exonic variants. Only a handful of them, primarily loss-of-function variants, have been shown to increase the risk for ASD, while the contributory roles of other variants, including most missense variants, remain unknown. New approaches that combine tissue-specific molecular profiles with patients' genetic data can thus play an important role in elucidating the functional impact of exonic variation and improve understanding of ASD pathogenesis. Here, we integrate spatio-temporal gene co-expression networks from the developing human brain and protein-protein interaction networks to first reach accurate prioritization of ASD risk genes based on their connectivity patterns with previously known high-confidence ASD risk genes. We subsequently integrate these gene scores with variant pathogenicity predictions to further prioritize individual exonic variants based on the positive-unlabeled learning framework with gene- and variant-score calibration. We demonstrate that this approach discriminates among variants between cases and controls at the high end of the prediction range. Finally, we experimentally validate our top-scoring de novo mutation NP_001243143.1:p.Phe309Ser in the sodium/potassium-transporting ATPase ATP1A3 to disrupt protein binding with different partners.
- Published
- 2021
9. Association of Genetic Predisposition and Physical Activity With Risk of Gestational Diabetes in Nulliparous Women
- Author
-
Kymberleigh A. Pagel, Hoyin Chu, Rashika Ramola, Rafael F. Guerrero, Judith H. Chung, Samuel Parry, Uma M. Reddy, Robert M. Silver, Jonathan G. Steller, Lynn M. Yee, Ronald J. Wapner, Matthew W. Hahn, Sriraam Natarajan, David M. Haas, and Predrag Radivojac
- Subjects
Adult ,Cohort Studies ,Diabetes, Gestational ,Diabetes Mellitus, Type 2 ,Pregnancy ,Humans ,Female ,Genetic Predisposition to Disease ,General Medicine ,Exercise - Abstract
Polygenic risk scores (PRS) for type 2 diabetes (T2D) can improve risk prediction for gestational diabetes (GD), yet the strength of the association between genetic and lifestyle risk factors has not been quantified.To assess the association of PRS and physical activity in existing GD risk models and identify patient subgroups who may receive the most benefits from a PRS or physical activity intervention.The Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be cohort was established to study individuals without previous pregnancy lasting at least 20 weeks (nulliparous) and to elucidate factors associated with adverse pregnancy outcomes. A subcohort of 3533 participants with European ancestry was used for risk assessment and performance evaluation. Participants were enrolled from October 5, 2010, to December 3, 2013, and underwent genotyping between February 19, 2019, and February 28, 2020. Data were analyzed from September 15, 2020, to November 10, 2021.Self-reported total physical activity in early pregnancy was quantified as metabolic equivalents of task (METs). Polygenic risk scores were calculated for T2D using contributions of 84 single nucleotide variants, weighted by their association in the Diabetes Genetics Replication and Meta-analysis Consortium data.Estimation of the development of GD from clinical, genetic, and environmental variables collected in early pregnancy, assessed using measures of model discrimination. Odds ratios and positive likelihood ratios were used to evaluate the association of PRS and physical activity with GD risk.A total of 3533 women were included in this analysis (mean [SD] age, 28.6 [4.9] years). In high-risk population subgroups (body mass index ≥25 or aged ≥35 years), individuals with high PRS (top 25th percentile) or low activity levels (METs450) had increased odds of a GD diagnosis of 25% to 75%. Compared with the general population, participants with both high PRS and low activity levels had higher odds of a GD diagnosis (odds ratio, 3.4 [95% CI, 2.3-5.3]), whereas participants with low PRS and high METs had significantly reduced risk of a GD diagnosis (odds ratio, 0.5 [95% CI, 0.3-0.9]; P = .01).In this cohort study, the addition of PRS was associated with the stratified risk of GD diagnosis among high-risk patient subgroups, suggesting the benefits of targeted PRS ascertainment to encourage early intervention.
- Published
- 2022
10. The influence of genetic predisposition and physical activity on risk of Gestational Diabetes Mellitus in the nuMoM2b cohort
- Author
-
Kymberleigh A. Pagel, Hoyin Chu, Rashika Ramola, Rafael F. Guerrero, Judith H. Chung, Samuel Parry, Uma M. Reddy, Robert M. Silver, Jonathan G. Steller, Lynn M. Yee, Ronald J. Wapner, Matthew W. Hahn, Sriraam Natarajan, David M. Haas, and Predrag Radivojac
- Abstract
ImportancePolygenic risk scores (PRS) for Type II Diabetes Mellitus (T2DM) can improve risk prediction for Gestational Diabetes Mellitus (GDM), yet the strength of the relationship between genetic and lifestyle risk factors has not been quantified.ObjectiveTo assess the effects of PRS and physical activity on existing GDM risk models and identify patient subgroups who may receive the most benefits from receiving a PRS or activity intervention.Design, Settings, and ParticipantsThe Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b) study was established to study individuals without previous pregnancy lasting 20 weeks or more (nulliparous) and to elucidate factors associated with adverse pregnancy outcomes. A sub-cohort of 3,533 participants with European ancestry were used for risk assessment and performance evaluation.ExposuresSelf-reported total physical activity in early pregnancy was quantified as metabolic equivalent of tasks (METs) in hours/week. Polygenic risk scores were calculated for T2DM using contributions of 85 single nucleotide variants, weighted by their association in the DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium data.Main Outcomes and MeasuresPrediction of the development of GDM from clinical, genetic, and environmental variables collected in early pregnancy. The risk model is assessed using measures of model discrimination and calibration. Odds ratio and positive likelihood ratio were used for evaluating the effect of PRS and physical activity on GDM risk.ResultsIn high-risk population subgroups (body mass index ≥ 25 or age ≥ 35), individuals with PRS in the top 25th percentile or METs below 450 have significantly increased odds of GDM diagnosis. Participants with both high PRS and low METs have three times higher odds of GDM diagnosis than the population. Conversely, participants with high PRS and METs ≥ 450 do not exhibit increased odds of GDM diagnosis, and those with low METs and low PRS have reduced odds of GDM. The relationship between PRS and METs was found to be nonadditive.Conclusions and RelevanceIn high-risk patient subgroups the addition of PRS resulted in increased risk of GDM diagnosis, suggesting the benefits of targeted PRS ascertainment to encourage early intervention. Increased physical activity is associated with decreased risk of GDM, particularly among individuals genetically predisposed to T2DM.Key PointsQuestionDo genetic predisposition to diabetes and physical activity in early pregnancy cooperatively impact risk of Gestational Diabetes Mellitus (GDM) among nulliparas?FindingsRisk of GDM diagnosis increases significantly for nulliparas with high polygenic risk score (PRS) and with low physical activity. The odds ratio of developing GDM with high PRS was estimated to be 2.2, 1.6 with low physical activity, and 3.5 in combination.MeaningPhysical activity in early pregnancy is associated with reduced risk of GDM and reversal of excess risk in genetically predisposed individuals. The interaction between PRS and physical activity may identify subjects for targeted interventions.
- Published
- 2022
11. Inferring the molecular and phenotypic impact of amino acid variants with MutPred2
- Author
-
Matthew Mort, Guan Ning Lin, Hyun-Jun Nam, David Neil Cooper, Predrag Radivojac, Jorge Urresti, Sean D. Mooney, Jose Lugo-Martinez, Vikas Pejaver, Jonathan Sebat, Kymberleigh A. Pagel, and Lilia M. Iakoucheva
- Subjects
0301 basic medicine ,Science ,General Physics and Astronomy ,Computational biology ,Disease ,Protein function predictions ,Biology ,medicine.disease_cause ,Genome ,Article ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,0302 clinical medicine ,Machine learning ,medicine ,Protein analysis ,Humans ,Genetic Predisposition to Disease ,lcsh:Science ,Mendelian disorders ,chemistry.chemical_classification ,Mutation ,Models, Statistical ,Multidisciplinary ,Genome, Human ,Computational Biology ,Proteins ,General Chemistry ,Phenotype ,Computational biology and bioinformatics ,Amino acid ,030104 developmental biology ,Amino Acid Substitution ,chemistry ,Protein structure predictions ,lcsh:Q ,Identification (biology) ,Inherited disease ,Software ,030217 neurology & neurosurgery - Abstract
Identifying pathogenic variants and underlying functional alterations is challenging. To this end, we introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid substitutions over existing methods, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. Whilst its prioritization performance is state-of-the-art, a distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants. We demonstrate the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization of de novo mutations associated with complex neurodevelopmental disorders. We then experimentally validate the functional impact of several variants identified in patients with such disorders. We argue that mechanism-driven studies of human inherited disease have the potential to significantly accelerate the discovery of clinically actionable variants., Identifying variants capable of causing genetic disease is challenging. The authors use semisupervised learning to predict pathogenic missense variants and their impacts on protein structure and function, enabling a molecular mechanism-driven approach to studying different types of human disease.
- Published
- 2020
12. Integrated Informatics Analysis of Cancer-Related Variants
- Author
-
Ben Busby, Kyle Moad, Rachel Karchin, Rick Kim, Lily Zheng, Collin Tokheim, Kymberleigh A. Pagel, and Michael T. Ryan
- Subjects
MEDLINE ,Computational biology ,Biology ,Workflow ,User-Computer Interface ,03 medical and health sciences ,0302 clinical medicine ,Neoplasms ,Databases, Genetic ,medicine ,Humans ,030304 developmental biology ,0303 health sciences ,Extramural ,Genetic variants ,Computational Biology ,Cancer ,ORIGINAL REPORTS ,General Medicine ,medicine.disease ,Neoplasm Proteins ,Neoplasms diagnosis ,Informatics ,Mutation ,Mutation (genetic algorithm) ,Special Series: Informatics Tools For Cancer Research and Care ,Software ,030217 neurology & neurosurgery - Abstract
PURPOSE The modern researcher is confronted with hundreds of published methods to interpret genetic variants. There are databases of genes and variants, phenotype-genotype relationships, algorithms that score and rank genes, and in silico variant effect prediction tools. Because variant prioritization is a multifactorial problem, a welcome development in the field has been the emergence of decision support frameworks, which make it easier to integrate multiple resources in an interactive environment. Current decision support frameworks are typically limited by closed proprietary architectures, access to a restricted set of tools, lack of customizability, Web dependencies that expose protected data, or limited scalability. METHODS We present the Open Custom Ranked Analysis of Variants Toolkit 1 (OpenCRAVAT) a new open-source, scalable decision support system for variant and gene prioritization. We have designed the resource catalog to be open and modular to maximize community and developer involvement, and as a result, the catalog is being actively developed and growing every month. Resources made available via the store are well suited for analysis of cancer, as well as Mendelian and complex diseases. RESULTS OpenCRAVAT offers both command-line utility and dynamic graphical user interface, allowing users to install with a single command, easily download tools from an extensive resource catalog, create customized pipelines, and explore results in a richly detailed viewing environment. We present several case studies to illustrate the design of custom workflows to prioritize genes and variants. CONCLUSION OpenCRAVAT is distinguished from similar tools by its capabilities to access and integrate an unprecedented amount of diverse data resources and computational prediction methods, which span germline, somatic, common, rare, coding, and noncoding variants.
- Published
- 2020
13. Predicting venous thromboembolism risk from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges
- Author
-
Moses Stamboulian, Rita Casadio, Rajgopal Srinivasan, Emidio Capriotti, Predrag Radivojac, Yana Bromberg, Sadhna Rana, Sean D. Mooney, Castrense Savojardo, Russ B. Altman, Yanran Wang, Panagiostis Katsonis, Steven E. Brenner, Yuxiang Jiang, Roxana Daneshjou, Kymberleigh A. Pagel, Samuele Bovo, John Moult, Gregory McInnes, Lipika R. Pal, Olivier Lichtarge, Pier Luigi Martelli, McInnes, Gregory, Daneshjou, Roxana, Katsonis, Panagiosti, Lichtarge, Olivier, Srinivasan, Raj G, Rana, Sadhna, Radivojac, Predrag, Mooney, Sean D, Pagel, Kymberleigh A, Stamboulian, Mose, Jiang, Yuxiang, Capriotti, Emidio, Wang, Yanran, Bromberg, Yana, Bovo, Samuele, Savojardo, Castrense, Martelli, Pier Luigi, Casadio, Rita, Pal, Lipika R, Moult, John, Brenner, Steven, and Altman, Russ
- Subjects
Male ,medicine.medical_specialty ,venous thromboembolism ,Disease ,Biology ,Genome ,Article ,03 medical and health sciences ,Exome Sequencing ,Genetics ,medicine ,Cluster Analysis ,Humans ,Genetic Predisposition to Disease ,cardiovascular diseases ,Exome ,Allele frequency ,Genetics (clinical) ,Exome sequencing ,030304 developmental biology ,0303 health sciences ,030305 genetics & heredity ,Confounding ,Warfarin ,Computational Biology ,prediction challenge ,Congresses as Topic ,equipment and supplies ,machine learning ,ROC Curve ,phenotype prediction ,Family medicine ,Female ,Venous thromboembolism ,exome ,Unsupervised Machine Learning ,medicine.drug - Abstract
Genetics play a key role in venous thromboembolism (VTE) risk, however established risk factors in European populations do not translate to individuals of African descent because of the differences in allele frequencies between populations. As part of the fifth iteration of the Critical Assessment of Genome Interpretation, participants were asked to predict VTE status in exome data from African American subjects. Participants were provided with 103 unlabeled exomes from patients treated with warfarin for non-VTE causes or VTE and asked to predict which disease each subject had been treated for. Given the lack of training data, many participants opted to use unsupervised machine learning methods, clustering the exomes by variation in genes known to be associated with VTE. The best performing method using only VTE related genes achieved an area under the ROC curve of 0.65. Here, we discuss the range of methods used in the prediction of VTE from sequence data and explore some of the difficulties of conducting a challenge with known confounders. In addition, we show that an existing genetic risk score for VTE that was developed in European subjects works well in African Americans.
- Published
- 2019
14. When loss-of-function is loss of function: assessing mutational signatures and impact of loss-of-function genetic variants
- Author
-
Sean D. Mooney, Kymberleigh A. Pagel, Matthew Mort, Vikas Pejaver, Predrag Radivojac, David Neil Cooper, Jonathan Sebat, Hyun-Jun Nam, Guan Ning Lin, and Lilia M. Iakoucheva
- Subjects
0301 basic medicine ,Statistics and Probability ,Protein Conformation ,Computational biology ,Disease ,Gene mutation ,Biology ,Biochemistry ,Genome ,Conserved sequence ,Machine Learning ,03 medical and health sciences ,Loss of Function Mutation ,Sequence Analysis, Protein ,Genetic variation ,Humans ,Molecular Biology ,Exome ,Loss function ,Computational Biology ,Proteins ,Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 ,Phenotype ,Computer Science Applications ,Vari ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Software - Abstract
Motivation Loss-of-function genetic variants are frequently associated with severe clinical phenotypes, yet many are present in the genomes of healthy individuals. The available methods to assess the impact of these variants rely primarily upon evolutionary conservation with little to no consideration of the structural and functional implications for the protein. They further do not provide information to the user regarding specific molecular alterations potentially causative of disease. Results To address this, we investigate protein features underlying loss-of-function genetic variation and develop a machine learning method, MutPred-LOF, for the discrimination of pathogenic and tolerated variants that can also generate hypotheses on specific molecular events disrupted by the variant. We investigate a large set of human variants derived from the Human Gene Mutation Database, ClinVar and the Exome Aggregation Consortium. Our prediction method shows an area under the Receiver Operating Characteristic curve of 0.85 for all loss-of-function variants and 0.75 for proteins in which both pathogenic and neutral variants have been observed. We applied MutPred-LOF to a set of 1142 de novo vari3ants from neurodevelopmental disorders and find enrichment of pathogenic variants in affected individuals. Overall, our results highlight the potential of computational tools to elucidate causal mechanisms underlying loss of protein function in loss-of-function variants. Availability and Implementation http://mutpred.mutdb.org
- Published
- 2017
15. OpenCRAVAT, an open source collaborative platform for the annotation of human genetic variation
- Author
-
Michael T. Ryan, Ben Busby, Rick Kim, Matthew Hynes-Grace, Lily Zheng, Rachel Karchin, Collin Tokheim, Kyle Moad, and Kymberleigh A. Pagel
- Subjects
Annotation ,Decision support system ,Workflow ,business.industry ,Computer science ,Scalability ,Human genetic variation ,Modular design ,business ,Gene ,Data science - Abstract
PURPOSEThe modern researcher is confronted with hundreds of published methods to interpret genetic variants. There are databases of genes and variants, phenotype-genotype relationships, algorithms that score and rank genes, and in silico variant effect prediction tools. Because variant prioritization is a multi-factorial problem, a welcome development in the field has been the emergence of decision support frameworks, which make it easier to integrate multiple resources in an interactive environment. Current decision support frameworks are typically limited by closed proprietary architectures, access to a restricted set of tools, lack of customizability, web dependencies that expose protected data, or limited scalability.METHODSWe present OpenCRAVAT, a new open source, scalable decision support system for variant and gene prioritization. We have designed the resource catalog to be open and modular to maximize community and developer involvement, and as a result the catalog is being actively developed and growing every month. Resources made available via the store are well-suited for analysis of cancer, as well as Mendelian and complex diseases.RESULTSOpenCRAVAT offers both command line utility and dynamic GUI, allowing users to install with a single command, easily download tools from an extensive resource catalog, create customized pipelines, and explore results in a richly detailed viewing environment. We present several case studies to illustrate the design of custom workflows to prioritize genes and variants.CONCLUSIONOpenCRAVAT is distinguished from similar tools by its capabilities to access and integrate an unprecedented amount of diverse data resources and computational prediction methods, which span germline, somatic, common, rare, coding and non-coding variants. OpenCRAVAT is freely available at https://opencravat.org
- Published
- 2019
- Full Text
- View/download PDF
16. High-throughput prediction of MHC Class I and Class II neoantigens with MHCnuggets
- Author
-
Lily Zheng, Benjamin Kaminow, Maria Bonsack, Dylan Hirsch, Xiaoshan M. Shao, I K Ashok Sivakumar, Angelika B. Riemer, Rohit Bhattacharya, Rachel Karchin, Victor E. Velculescu, Justin C. Huang, Valsamo Anagnostou, Kymberleigh A. Pagel, Collin Tokheim, and Ashton Omdahl
- Subjects
Cancer Research ,medicine.medical_treatment ,In silico ,Immunology ,Mutation, Missense ,Plasma protein binding ,Computational biology ,CD8-Positive T-Lymphocytes ,Major histocompatibility complex ,Cancer Vaccines ,Article ,03 medical and health sciences ,0302 clinical medicine ,Antigen ,Cancer immunotherapy ,Antigens, Neoplasm ,Artificial Intelligence ,Predictive Value of Tests ,Neoplasms ,MHC class I ,medicine ,Data Mining ,Humans ,Missense mutation ,Allele ,Gene ,030304 developmental biology ,0303 health sciences ,biology ,Histocompatibility Antigens Class I ,Histocompatibility Antigens Class II ,Computational Biology ,3. Good health ,030220 oncology & carcinogenesis ,biology.protein ,Biomarker (medicine) ,Neural Networks, Computer ,Algorithms ,Software ,CD8 ,Protein Binding ,030215 immunology - Abstract
Computational prediction of binding between neoantigen peptides and major histocompatibility complex (MHC) proteins is an emerging biomarker for predicting patient response to cancer immunotherapy. Current neoantigen predictors focus onin silicoestimation of MHC binding affinity and are limited by low positive predictive value for actual peptide presentation, inadequate support for rare MHC alleles and poor scalability to high-throughput data sets. To address these limitations, we developed MHCnuggets, a deep neural network method to predict peptide-MHC binding. MHCnuggets is the only method to handle binding prediction for common or rare alleles of MHC Class I or II, with a single neural network architecture. Using a long short-term memory network (LSTM), MHCnuggets accepts peptides of variable length and is capable of faster performance than other methods. When compared to methods that integrate binding affinity and HLAp data from mass spectrometry, MHCnuggets yields a fourfold increase in positive predictive value on independent MHC-bound peptide (HLAp) data. We applied MHCnuggets to 26 cancer types in TCGA, processing 26.3 million allele-peptide comparisons in under 2.3 hours, yielding 101,326 unique candidate immunogenic missense mutations (IMMs). Predicted-IMM hotspots occurred in 38 genes, including 24 driver genes. Predicted-IMM load was significantly associated with increased immune cell infiltration (p2 patients, with 61.7% of these derived from driver mutations. Our results provide a new method for neoantigen prediction with high performance characteristics and demonstrate its utility in large data sets across human cancers.SynopsisWe developed a newin silicopredictor of Major Histocompatibility Complex (MHC) ligand binding and demonstrated its utility to assess potential neoantigens and immunogenic missense mutations (IMMs) in 6613 TCGA patients.
- Published
- 2019
17. Assessment of patient clinical descriptions and pathogenic variants from gene panel sequences in the CAGI-5 intellectual disability challenge
- Author
-
Yuxiang Jiang, Jingqi Chen, Silvio C. E. Tosatto, Mariagrazia Bellini, Zhiqiang Hu, John Moult, Olivier Lichtarge, Ivan Limongelli, Francesco Reggiani, Alexander Miguel Monzon, Stephen J. Wilson, Panagiotis Katsonis, Kymberleigh A. Pagel, Marco Carraro, Predrag Radivojac, Alessandra Murgia, Kunal Kundu, Emanuela Leonardi, Carlo Ferrari, Gaia Andreoletti, Steven E. Brenner, Yaqiong Wang, Lipika R. Pal, Maria Cristina Aspromonte, Yizhou Yin, and Luigi Chiricosta
- Subjects
Male ,Microcephaly ,Ataxia ,Autism Spectrum Disorder ,Quantitative Trait Loci ,Biology ,Bioinformatics ,Article ,genetic testing ,03 medical and health sciences ,community challenge ,critical assessment ,phenotype prediction ,variant interpretation ,Intellectual Disability ,Intellectual disability ,Genetics ,medicine ,Humans ,Genetic Predisposition to Disease ,Genetics (clinical) ,030304 developmental biology ,Genetic testing ,0303 health sciences ,medicine.diagnostic_test ,030305 genetics & heredity ,Macrocephaly ,Computational Biology ,Sequence Analysis, DNA ,medicine.disease ,Comorbidity ,Hypotonia ,Phenotype ,Autism ,Female ,medicine.symptom - Abstract
The Critical Assessment of Genome Interpretation-5 intellectual disability challenge asked to use computational methods to predict patient clinical phenotypes and the causal variant(s) based on an analysis of their gene panel sequence data. Sequence data for 74 genes associated with intellectual disability (ID) and/or autism spectrum disorders (ASD) from a cohort of 150 patients with a range of neurodevelopmental manifestations (i.e. ID, autism, epilepsy, microcephaly, macrocephaly, hypotonia, ataxia) have been made available for this challenge. For each patient, predictors had to report the causative variants and which of the seven phenotypes were present. Since neurodevelopmental disorders are characterized by strong comorbidity, tested individuals often present more than one pathological condition. Considering the overall clinical manifestation of each patient, the correct phenotype has been predicted by at least one group for 93 individuals (62%). ID and ASD were the best predicted among the seven phenotypic traits. Also, causative or potentially pathogenic variants were predicted correctly by at least one group. However, the prediction of the correct causative variant seems to be insufficient to predict the correct phenotype. In some cases, the correct prediction has been supported by rare or common variants in genes different from the causative one.
- Published
- 2019
18. The Johns Hopkins Molecular Tumor Board Precision Oncology elective for Medical Oncology fellows
- Author
-
Jessica J. Tao, Ross C. Donehower, Rachel Karchin, Emily Nizialek, Kristen A. Marrone, Kymberleigh A. Pagel, Jenna Van Liere Canzoniero, Paola Ghanem, and Valsamo Anagnostou
- Subjects
Cancer Research ,medicine.medical_specialty ,Oncology ,Clinical decision making ,business.industry ,Precision oncology ,Medicine ,Tumor board ,Cancer ,Medical physics ,business ,medicine.disease - Abstract
11035 Background: The accelerated impact of next generation sequencing (NGS) in clinical decision making requires the integration of cancer genomics and precision oncology focused training into medical oncology education. The Johns Hopkins Molecular Tumor Board (JH MTB) is a multi-disciplinary effort focused on integration of NGS findings with critical evidence interpretation to generate personalized recommendations tailored to the genetic footprint of individual patients. Methods: The JH MTB and the Medical Oncology Fellowship Program have developed a 3-month precision oncology elective for fellows in their research years. Commencing fall of 2020, the goals of this elective are to enhance the understanding of NGS platforms and findings, advance the interpretation and characterization of molecular assay outputs by use of mutation annotators and knowledgebases and ultimately master the art of matching NGS findings with available therapies. Fellow integration into the MTB focuses on mentored case-based learning in mutation characterization and ranking by levels of evidence for actionability, with culmination in form of verbal presentations and written summary reports of final MTB recommendations. A mixed methods questionnaire was administered to evaluate progress since elective initiation. Results: Three learners who have participated as of February 2021 were included. Of the two who had completed the MTB elective, each have presented at least 10 cases, with at least 1 scholarly publication planned. All indicated strong agreement that MTB elective had increased their comfort with interpreting clinical NGS reports as well as the use of knowledgebases and variant annotators. Exposure to experts in the field of molecular precision oncology, identification of resources necessary to interpret clinical NGS reports, development of ability to critically assess various NGS platforms, and gained familiarity with computational analyses relevant to clinical decision making were noted as strengths of the MTB elective. Areas of improvement included ongoing initiatives that involve streamlining variant annotation and transcription of information for written reports. Conclusions: A longitudinal elective in the JHU MTB has been found to be preliminarily effective in promoting knowledge mastery and creating academic opportunities related to the clinical application of precision medicine. Future directions will include leveraging of the MTB infrastructure for research projects, learner integration into computational laboratory meetings, and expansion of the MTB curriculum to include different levels of learners from multiple medical education programs. Continued elective participation will be key to understanding how best to facilitate adaptive expertise in assigning clinical relevance to genomic findings, ultimately improving precision medicine delivery in patient care and trial development.
- Published
- 2021
19. 39. Integrated informatics analysis of cancer-related variants with OpenCRAVAT
- Author
-
Lily Zheng, Michael T. Ryan, Ben Busby, Rachel Karchin, Rick Kim, Kyle Moad, Kymberleigh A. Pagel, and Collin Tolkheim
- Subjects
Cancer Research ,Informatics ,Genetics ,medicine ,Cancer ,Computational biology ,Biology ,medicine.disease ,Molecular Biology - Published
- 2020
20. Abstract A33: High-throughput prediction of MHC Class I and Class II neoantigens with MHCnuggets
- Author
-
Angelika B. Riemer, Ashton Omdahl, Dylan Hirsch, Rachel Karchin, Xiaoshan M. Shao, Collin Tokheim, Victor E. Velculescu, Rohit Bhattacharya, Ben Kaminow, Kymberleigh A. Pagel, Lily Zheng, Justin C. Huang, Ashok Sivakumar, Maria Bonsack, and Valsamo Anagnostou
- Subjects
Cancer Research ,Class (computer programming) ,biology ,Computer science ,Immunology ,MHC class I ,biology.protein ,Computational biology ,Throughput (business) - Abstract
Computational prediction of binding between neoantigen peptides and major histocompatibility complex (MHC) proteins is an emerging biomarker for predicting patient response to cancer immunotherapy. Current neoantigen predictors focus on in silico estimation of MHC binding affinity and are limited by low positive predictive value for actual peptide presentation, inadequate support for rare MHC alleles, and poor scalability to high-throughput data sets. To address these limitations, we developed MHCnuggets, a deep neural network method to predict peptide-MHC binding. MHCnuggets is the only method to handle binding prediction for common or rare alleles of MHC Class I or II, with a single neural network architecture. Using a long short-term memory network (LSTM), MHCnuggets accepts peptides of variable length and is capable of faster performance than other methods. When compared to methods that integrate binding affinity and HLAp data from mass spectrometry, MHCnuggets yields a fourfold increase in positive predictive value on independent MHC-bound peptide (HLAp) data. We applied MHCnuggets to 26 cancer types in TCGA, processing 52.6 million allele-peptide comparisons in under 2.3 hours, yielding 103,587 candidate immunogenic missense mutations (IMMs). IMM hotspots occurred in 36 genes, including 22 driver genes. Predicted IMM load was significantly associated with increased immune cell infiltration (p2 patients, with 65% of these derived from driver mutations. Our results provide a new method for neoantigen prediction with high performance characteristics and demonstrate its utility in large data sets across human cancers. Citation Format: Xiaoshan M. Shao, Rohit Bhattacharya, Justin Huang, Ashok Sivakumar, Collin Tokheim, Lily Zheng, Dylan Hirsch, Ben Kaminow, Ashton Omdahl, Maria Bonsack, Angelika B. Riemer, Victor E. Velculescu, Valsamo Anagnostou, Kymberleigh Pagel, Rachel Karchin. High-throughput prediction of MHC Class I and Class II neoantigens with MHCnuggets [abstract]. In: Proceedings of the AACR Special Conference on Tumor Immunology and Immunotherapy; 2019 Nov 17-20; Boston, MA. Philadelphia (PA): AACR; Cancer Immunol Res 2020;8(3 Suppl):Abstract nr A33.
- Published
- 2020
21. Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome
- Author
-
Danny Antaki, Sean D. Mooney, David Neil Cooper, Kymberleigh A. Pagel, Lilia M. Iakoucheva, Predrag Radivojac, Aojie Lian, Matthew Mort, and Jonathan Sebat
- Subjects
0301 basic medicine ,Pervasive Developmental Disorders ,Autism Spectrum Disorder ,Social Sciences ,Pathogenesis ,Protein Structure Prediction ,Gene mutation ,Pathology and Laboratory Medicine ,Genome ,Biochemistry ,Machine Learning ,Database and Informatics Methods ,0302 clinical medicine ,Mathematical and Statistical Techniques ,INDEL Mutation ,Databases, Genetic ,Medicine and Health Sciences ,Macromolecular Structure Analysis ,Psychology ,Biology (General) ,Genetics ,Ecology ,Statistics ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Sequence Analysis ,Research Article ,Protein Structure ,QH301-705.5 ,Bioinformatics ,Sequence Databases ,Context (language use) ,Biology ,Research and Analysis Methods ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Genetic variation ,Humans ,Genetic Predisposition to Disease ,Allele ,Statistical Methods ,Indel ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Alleles ,Evolutionary Biology ,Population Biology ,Genome, Human ,Computational Biology ,Biology and Life Sciences ,Proteins ,Human genetics ,030104 developmental biology ,Biological Databases ,ROC Curve ,Genetic Loci ,Developmental Psychology ,Mutation Databases ,Mutation ,Genetic Polymorphism ,Human genome ,030217 neurology & neurosurgery ,Mathematics ,Population Genetics ,Forecasting - Abstract
Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/., Author summary An individual genome contains around ten thousand missense variants, hundreds of insertion/deletion variants, and dozens of protein truncating variants. Among them, non-frameshifting insertion and deletion variants exhibit diverse impact on protein sequence, encompassing alterations from a single residue to the deletion of entire functional domains. Although the majority of revealed insertion/deletions have unknown phenotypic consequences, computational variant effect prediction methods are less well-described for such variation. To this end, we develop MutPred-Indel, a machine learning method to predict the pathogenicity of non-frameshifting insertion/deletion variation and, in addition, highlight structural and functional mechanisms potentially impacted by a given variant. We identify several functionally important molecular mechanisms that are impacted differently among germline, de novo, and somatic variation in contrast to putatively neutral variation. MutPred-Indel is shown to have strong performance in pathogenicity prediction and potential to identify impacted molecular features, which collectively facilitates a deeper understanding of non-frameshifting insertion/deletion variation.
- Published
- 2018
22. Working towards precision medicine: predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges
- Author
-
Predrag Radivojac, Yanran Wang, Kunal Kundu, Maggie Haitian Wang, Laksshman Sundaram, Pier Luigi Martelli, Sohela Shah, Steven E. Brenner, Emanuela Leonardi, Yuxiang Jiang, Roxana Daneshjou, Mehdi Pirooznia, Marco Carraro, Rita Casadio, Biao Li, Giulia Babbi, Peter P. Zandi, John Moult, Silvio C. E. Tosatto, Andre Franke, Yanay Ofran, James B. Potash, David T. Jones, Mauno Vihinen, Billy Chang, Sean D. Mooney, Pietro Di Lena, Roger A. Hoskins, Russ B. Altman, David K. Gifford, Rajendra Rana Bhat, Kymberleigh A. Pagel, Carlo Ferrari, Yana Bromberg, Susanna Repo, Britt-Sabina Petersen, Xiaolin Li, Yizhou Yin, Alexander A. Morgan, Teri E. Klein, Lipika R. Pal, Ron Unger, Samuele Bovo, Abhishek Niroula, Richard W. McCombie, Vikas Pejaver, Eran Bachar, Matthew D. Edwards, Alessandra Gasparini, Johnathan Roy Azaria, Manuel Giollo, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Daneshjou, Roxana, Wang, Yanran, Bromberg, Yana, Bovo, Samuele, Martelli, Pier L, Babbi, Giulia, Pietro Di, Lena, Casadio, Rita, Edwards, Matthew, Gifford, David, Jones, David T, Sundaram, Laksshman, Bhat, Rajendra Rana, Xiaolin, Li, Pal, Lipika R., Kundu, Kunal, Yin, Yizhou, Moult, John, Jiang, Yuxiang, Pejaver, Vika, Pagel, Kymberleigh A., Biao, Li, Mooney, Sean D., Radivojac, Predrag, Shah, Sohela, Carraro, Marco, Gasparini, Alessandra, Leonardi, Emanuela, Giollo, Manuel, Ferrari, Carlo, Tosatto, Silvio C E, Bachar, Eran, Azaria, Johnathan R., Ofran, Yanay, Unger, Ron, Niroula, Abhishek, Vihinen, Mauno, Chang, Billy, Wang, Maggie H, Franke, Andre, Petersen, Britt-Sabina, Pirooznia, Mehdi, Zandi, Peter, Mccombie, Richard, Potash, James B., Altman, Russ B., Klein, Teri E., Hoskins, Roger A., Repo, Susanna, Brenner, Steven E., and Morgan, Alexander A.
- Subjects
0301 basic medicine ,Bipolar Disorder ,Pharmacogenomic Variants ,Information Dissemination ,Disease ,Biology ,Bioinformatics ,Genome ,Whole Exome Sequencing ,Article ,03 medical and health sciences ,0302 clinical medicine ,Genetic ,Crohn Disease ,bipolar disorder ,Crohn's disease ,exomes ,machine learning ,phenotype prediction ,warfarin ,Genetics ,Genetics (clinical) ,Databases, Genetic ,Exome Sequencing ,Humans ,Genetic Predisposition to Disease ,Precision Medicine ,Exome ,Exome sequencing ,Interpretation (philosophy) ,Computational Biology ,Precision medicine ,Data science ,Phenotype ,030104 developmental biology ,Pharmacogenomic Variant ,Warfarin ,exome ,030217 neurology & neurosurgery ,Human - Abstract
Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotypeâphenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome-sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotypeâphenotype relationships.
- Published
- 2017
23. MutPred2: inferring the molecular and phenotypic impact of amino acid variants
- Author
-
Matthew Mort, Jorge Urresti, Jose Lugo-Martinez, Sean D. Mooney, David Neil Cooper, Kymberleigh A. Pagel, Hyun-Jun Nam, Predrag Radivojac, Jonathan Sebat, Guan Ning Lin, Lilia M. Iakoucheva, and Pejaver
- Subjects
Genetics ,chemistry.chemical_classification ,0303 health sciences ,Disease ,Biology ,Pathogenicity ,Phenotype ,Genome ,Amino acid ,03 medical and health sciences ,0302 clinical medicine ,chemistry ,Identification (biology) ,Mendelian disorders ,030217 neurology & neurosurgery ,De novo mutations ,030304 developmental biology - Abstract
We introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid substitutions, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. While its prioritization performance is state-of-the-art, a novel and distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants. We demonstrate the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization ofde novomutations associated with complex neurodevelopmental disorders. We then experimentally validate the functional impact of several variants identified in patients with such disorders. We argue that mechanism-driven studies of human inherited diseases have the potential to significantly accelerate the discovery of clinically actionable variants.Availability:http://mutpred.mutdb.org/
- Published
- 2017
- Full Text
- View/download PDF
24. The loss and gain of functional amino acid residues is a common mechanism causing human inherited disease
- Author
-
Jose Lugo-Martinez, Vikas Pejaver, Predrag Radivojac, Sean D. Mooney, David Neil Cooper, Kymberleigh A. Pagel, Shantanu Jain, and Matthew Mort
- Subjects
Models, Molecular ,0301 basic medicine ,Protein Structure Prediction ,Plasma protein binding ,medicine.disease_cause ,Biochemistry ,Infographics ,Database and Informatics Methods ,Protein Structure Databases ,Protein structure ,Macromolecular Structure Analysis ,Disease ,Amino Acids ,lcsh:QH301-705.5 ,Peptide sequence ,Genetics ,chemistry.chemical_classification ,Mutation ,Translational bioinformatics ,Ecology ,Organic Compounds ,Protein structure prediction ,Amino acid ,Chemistry ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Graphs ,Research Article ,Protein Binding ,Protein Structure ,Computer and Information Sciences ,Substitution Mutation ,Mutagenesis (molecular biology technique) ,Biology ,Research and Analysis Methods ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,medicine ,Humans ,Computer Simulation ,Amino Acid Sequence ,Molecular Biology ,QH426 ,Ecology, Evolution, Behavior and Systematics ,Models, Statistical ,Data Visualization ,Organic Chemistry ,Chemical Compounds ,Biology and Life Sciences ,Proteins ,Computational Biology ,Biological Databases ,030104 developmental biology ,lcsh:Biology (General) ,Amino Acid Substitution ,chemistry ,Mutagenesis ,Mutation Databases - Abstract
Elucidating the precise molecular events altered by disease-causing genetic variants represents a major challenge in translational bioinformatics. To this end, many studies have investigated the structural and functional impact of amino acid substitutions. Most of these studies were however limited in scope to either individual molecular functions or were concerned with functional effects (e.g. deleterious vs. neutral) without specifically considering possible molecular alterations. The recent growth of structural, molecular and genetic data presents an opportunity for more comprehensive studies to consider the structural environment of a residue of interest, to hypothesize specific molecular effects of sequence variants and to statistically associate these effects with genetic disease. In this study, we analyzed data sets of disease-causing and putatively neutral human variants mapped to protein 3D structures as part of a systematic study of the loss and gain of various types of functional attribute potentially underlying pathogenic molecular alterations. We first propose a formal model to assess probabilistically function-impacting variants. We then develop an array of structure-based functional residue predictors, evaluate their performance, and use them to quantify the impact of disease-causing amino acid substitutions on catalytic activity, metal binding, macromolecular binding, ligand binding, allosteric regulation and post-translational modifications. We show that our methodology generates actionable biological hypotheses for up to 41% of disease-causing genetic variants mapped to protein structures suggesting that it can be reliably used to guide experimental validation. Our results suggest that a significant fraction of disease-causing human variants mapping to protein structures are function-altering both in the presence and absence of stability disruption., Author Summary Identifying the molecular changes caused by mutations is a major challenge in understanding and treating human genetic disease. To address this problem, we have developed a wide range of profiling tools designed to predict specific types of functional site from protein 3D structures. We then apply these tools to data sets of inherited disease-associated and putatively neutral amino acid substitutions and estimate the relative contribution of the loss and gain of functional residues in disease. Our results suggest that alterations of molecular function are involved in a significant number of cases of human genetic disease and are over-represented as compared to putatively neutral variants. Additionally, we use experimental data to show that it is possible to computationally identify the loss of specific functional events in disease pathogenesis. Finally, our methodology can be used to reliably identify the potential molecular consequences of disease-causing genetic variants and hence prioritize experimental validation.
- Published
- 2016
25. The sequencing and interpretation of the genome obtained from a Serbian individual.
- Author
-
Wazim Mohammed Ismail, Kymberleigh A Pagel, Vikas Pejaver, Simo V Zhang, Sofia Casasa, Matthew Mort, David N Cooper, Matthew W Hahn, and Predrag Radivojac
- Subjects
Medicine ,Science - Abstract
Recent genetic studies and whole-genome sequencing projects have greatly improved our understanding of human variation and clinically actionable genetic information. Smaller ethnic populations, however, remain underrepresented in both individual and large-scale sequencing efforts and hence present an opportunity to discover new variants of biomedical and demographic significance. This report describes the sequencing and analysis of a genome obtained from an individual of Serbian origin, introducing tens of thousands of previously unknown variants to the currently available pool. Ancestry analysis places this individual in close proximity to Central and Eastern European populations; i.e., closest to Croatian, Bulgarian and Hungarian individuals and, in terms of other Europeans, furthest from Ashkenazi Jewish, Spanish, Sicilian and Baltic individuals. Our analysis confirmed gene flow between Neanderthal and ancestral pan-European populations, with similar contributions to the Serbian genome as those observed in other European groups. Finally, to assess the burden of potentially disease-causing/clinically relevant variation in the sequenced genome, we utilized manually curated genotype-phenotype association databases and variant-effect predictors. We identified several variants that have previously been associated with severe early-onset disease that is not evident in the proband, as well as putatively impactful variants that could yet prove to be clinically relevant to the proband over the next decades. The presence of numerous private and low-frequency variants, along with the observed and predicted disease-causing mutations in this genome, exemplify some of the global challenges of genome interpretation, especially in the context of under-studied ethnic groups.
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.