78 results on '"Stenson PD"'
Search Results
2. An integrated map of genetic variation from 1,092 human genomes
- Author
-
Altshuler, DM, Durbin, RM, Abecasis, GR, Bentley, DR, Chakravarti, A, Clark, AG, Donnelly, P, Eichler, EE, Flicek, P, Gabriel, SB, Gibbs, RA, Green, ED, Hurles, ME, Knoppers, BM, Korbel, JO, Lander, ES, Lee, C, Lehrach, H, Mardis, ER, Marth, GT, McVean, GA, Nickerson, DA, Schmidt, JP, Sherry, ST, Wang, J, Wilson, RK, Dinh, H, Kovar, C, Lee, S, Lewis, L, Muzny, D, Reid, J, Wang, M, Fang, X, Guo, X, Jian, M, Jiang, H, Jin, X, Li, G, Li, J, Li, Y, Li, Z, Liu, X, Lu, Y, Ma, X, Su, Z, Tai, S, Tang, M, Wang, B, Wang, G, Wu, H, Wu, R, Yin, Y, Zhang, W, Zhao, J, Zhao, M, Zheng, X, Zhou, Y, Gupta, N, Clarke, L, Leinonen, R, Smith, RE, Zheng-Bradley, X, Grocock, R, Humphray, S, James, T, Kingsbury, Z, Sudbrak, R, Albrecht, MW, Amstislavskiy, VS, Borodina, TA, Lienhard, M, Mertes, F, Sultan, M, Timmermann, B, Yaspo, M-L, Fulton, L, Fulton, R, Weinstock, GM, Balasubramaniam, S, Burton, J, Danecek, P, Keane, TM, Kolb-Kokocinski, A, McCarthy, S, Stalker, J, Quail, M, Davies, CJ, Gollub, J, Webster, T, Wong, B, Zhan, Y, Auton, A, Yu, F, Bainbridge, M, Challis, D, Evani, US, Lu, J, Nagaswamy, U, Sabo, A, Wang, Y, Yu, J, Coin, LJM, Fang, L, Li, Q, Lin, H, Liu, B, Luo, R, Qin, N, Shao, H, Xie, Y, Ye, C, Yu, C, Zhang, F, Zheng, H, Zhu, H, Garrison, EP, Kural, D, Lee, W-P, Leong, WF, Ward, AN, Wu, J, Zhang, M, Griffin, L, Hsieh, C-H, Mills, RE, Shi, X, Von Grotthuss, M, Zhang, C, Daly, MJ, DePristo, MA, Banks, E, Bhatia, G, Carneiro, MO, Del Angel, G, Genovese, G, Handsaker, RE, Hartl, C, McCarroll, SA, Nemesh, JC, Poplin, RE, Schaffner, SF, Shakir, K, Yoon, SC, Lihm, J, Makarov, V, Jin, H, Kim, W, Kim, KC, Rausch, T, Beal, K, Cunningham, F, Herrero, J, McLaren, WM, Ritchie, GRS, Gottipati, S, Keinan, A, Rodriguez-Flores, JL, Sabeti, PC, Grossman, SR, Tabrizi, S, Tariyal, R, Cooper, DN, Ball, EV, Stenson, PD, Barnes, B, Bauer, M, Cheetham, RK, Cox, T, Eberle, M, Kahn, S, Murray, L, Peden, J, Shaw, R, Ye, K, Batzer, MA, Konkel, MK, Walker, JA, MacArthur, DG, Lek, M, Herwig, R, Shriver, MD, Bustamante, CD, Byrnes, JK, De la Vega, FM, Gravel, S, Kenny, EE, Kidd, JM, Lacroute, P, Maples, BK, Moreno-Estrada, A, Zakharia, F, Halperin, E, Baran, Y, Craig, DW, Christoforides, A, Homer, N, Izatt, T, Kurdoglu, AA, Sinari, SA, Squire, K, Xiao, C, Sebat, J, Bafna, V, Burchard, EG, Hernandez, RD, Gignoux, CR, Haussler, D, Katzman, SJ, Kent, WJ, Howie, B, Ruiz-Linares, A, Dermitzakis, ET, Lappalainen, T, Devine, SE, Maroo, A, Tallon, LJ, Rosenfeld, JA, Michelson, LP, Kang, HM, Anderson, P, Angius, A, Bigham, A, Blackwell, T, Busonero, F, Cucca, F, Fuchsberger, C, Jones, C, Jun, G, Lyons, R, Maschio, A, Porcu, E, Reinier, F, Sanna, S, Schlessinger, D, Sidore, C, Tan, A, Trost, MK, Awadalla, P, Hodgkinson, A, Lunter, G, Marchini, JL, Myers, S, Churchhouse, C, Delaneau, O, Gupta-Hinch, A, Iqbal, Z, Mathieson, I, Rimmer, A, Xifara, DK, Oleksyk, TK, Fu, Y, Xiong, M, Jorde, L, Witherspoon, D, Xing, J, Browning, BL, Alkan, C, Hajirasouliha, I, Hormozdiari, F, Ko, A, Sudmant, PH, Chen, K, Chinwalla, A, Ding, L, Dooling, D, Koboldt, DC, McLellan, MD, Wallis, JW, Wendl, MC, Zhang, Q, Tyler-Smith, C, Albers, CA, Ayub, Q, Chen, Y, Coffey, AJ, Colonna, V, Huang, N, Jostins, L, Li, H, Scally, A, Walter, K, Xue, Y, Zhang, Y, Gerstein, MB, Abyzov, A, Balasubramanian, S, Chen, J, Clarke, D, Habegger, L, Harmanci, AO, Jin, M, Khurana, E, Mu, XJ, Sisu, C, Degenhardt, J, Stuetz, AM, Church, D, Michaelson, JJ, Ben, B, Lindsay, SJ, Ning, Z, Frankish, A, Harrow, J, Fowler, G, Hale, W, Kalra, D, Barker, J, Kelman, G, Kulesha, E, Radhakrishnan, R, Roa, A, Smirnov, D, Streeter, I, Toneva, I, Vaughan, B, Ananiev, V, Belaia, Z, Beloslyudtsev, D, Bouk, N, Chen, C, Cohen, R, Cook, C, Garner, J, Hefferon, T, Kimelman, M, Liu, C, Lopez, J, Meric, P, O'Sullivan, C, Ostapchuk, Y, Phan, L, Ponomarov, S, Schneider, V, Shekhtman, E, Sirotkin, K, Slotta, D, Zhang, H, Barnes, KC, Beiswanger, C, Cai, H, Cao, H, Gharani, N, Henn, B, Jones, D, Kaye, JS, Kent, A, Kerasidou, A, Mathias, R, Ossorio, PN, Parker, M, Reich, D, Rotimi, CN, Royal, CD, Sandoval, K, Su, Y, Tian, Z, Tishkoff, S, Toji, LH, Via, M, Yang, H, Yang, L, Zhu, J, Bodmer, W, Bedoya, G, Ming, CZ, Yang, G, You, CJ, Peltonen, L, Garcia-Montero, A, Orfao, A, Dutil, J, Martinez-Cruzado, JC, Brooks, LD, Felsenfeld, AL, McEwen, JE, Clemm, NC, Duncanson, A, Dunn, M, Guyer, MS, Peterson, JL, 1000 Genomes Project Consortium, Dermitzakis, Emmanouil, Universitat de Barcelona, Massachusetts Institute of Technology. Department of Biology, Altshuler, David, and Lander, Eric S.
- Subjects
Natural selection ,LOCI ,Genome-wide association study ,Evolutionary biology ,Continental Population Groups/genetics ,Human genetic variation ,VARIANTS ,Genoma humà ,Binding Sites/genetics ,0302 clinical medicine ,RARE ,Sequence Deletion/genetics ,WIDE ASSOCIATION ,ddc:576.5 ,Copy-number variation ,MUTATION ,Exome sequencing ,transcription factor ,Conserved Sequence ,Human evolution ,Sequence Deletion ,Genetics ,RISK ,0303 health sciences ,Multidisciplinary ,Continental Population Groups ,1000 Genomes Project Consortium ,Genetic analysis ,Genomics ,Polymorphism, Single Nucleotide/genetics ,Research Highlight ,3. Good health ,Algorithm ,Multidisciplinary Sciences ,Genetic Variation/genetics ,Map ,Science & Technology - Other Topics ,Conserved Sequence/genetics ,Integrated approach ,General Science & Technology ,Genetics, Medical ,Haplotypes/genetics ,Biology ,Polymorphism, Single Nucleotide ,Evolution, Molecular ,03 medical and health sciences ,Genetic variation ,Humans ,Transcription Factors/metabolism ,POPULATION-STRUCTURE ,1000 Genomes Project ,Polymorphism ,Nucleotide Motifs ,Alleles ,030304 developmental biology ,COPY NUMBER VARIATION ,Science & Technology ,Binding Sites ,Human genome ,Genome, Human ,Racial Groups ,Genetic Variation ,Genetics, Population ,Haplotypes ,Genome, Human/genetics ,untranslated RNA ,030217 neurology & neurosurgery ,Transcription Factors ,Genome-Wide Association Study - Abstract
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations., National Institutes of Health (U.S.) (Grant RC2HL102925), National Institutes of Health (U.S.) (Grant U54HG3067)
- Published
- 2012
3. Accurate identification of genes associated with brain disorders by integrating heterogeneous genomic data into a Bayesian framework.
- Author
-
He D, Li L, Zhang H, Liu F, Li S, Xiu X, Fan C, Qi M, Meng M, Ye J, Mort M, Stenson PD, Cooper DN, and Zhao H
- Subjects
- Humans, Genomics methods, Computational Biology methods, Quantitative Trait Loci, Polymorphism, Single Nucleotide, Bayes Theorem, Genetic Predisposition to Disease, Genome-Wide Association Study, Brain Diseases genetics
- Abstract
Background: Genome-wide association studies (GWAS) have revealed many brain disorder-associated SNPs residing in the noncoding genome, rendering it a challenge to decipher the underlying pathogenic mechanisms., Methods: Here, we present an unsupervised Bayesian framework to identify disease-associated genes by integrating risk SNPs with long-range chromatin interactions (iGOAT), including SNP-SNP interactions extracted from ∼500,000 patients and controls from the UK Biobank, and enhancer-promoter interactions derived from multiple brain cell types at different developmental stages., Findings: The application of iGOAT to three psychiatric disorders and three neurodegenerative/neurological diseases predicted sets of high-risk (HRGs) and low-risk (LRGs) genes for each disorder. The HRGs were enriched in drug targets, and exhibited higher expression during prenatal brain developmental stages than postnatal stages, indicating their potential to affect brain development at an early stage. The HRGs associated with Alzheimer's disease were found to share genetic architecture with schizophrenia, bipolar disorder and major depressive disorder according to gene co-expression module analysis and rare variants analysis. Comparisons of this method to the eQTL-based method, the TWAS-based method, and the gene-level GWAS method indicated that the genes identified by our method are more enriched in known brain disorder-related genes, and exhibited higher precision. Finally, the method predicted 205 risk genes not previously reported to be associated with any brain disorder, of which one top-risk gene, MLH1, was experimentally validated as being schizophrenia-associated., Interpretation: iGOAT can successfully leverage epigenomic data, phenotype-genotype associations, and protein-protein interactions to advance our understanding of brain disorders, thereby facilitating the development of new therapeutic approaches., Funding: The work was funded by the National Key Research and Development Program of China (2024YFF1204902), the Natural Science Foundation of China (82371482), Guangzhou Science and Technology Research Plan (2023A03J0659) and Natural Science Foundation of Guangdong (2024A1515011363)., Competing Interests: Declaration of interests The authors declare no competing interests., (Copyright © 2024 The Author(s). Published by Elsevier B.V. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
4. The landscape of rare genetic variation associated with inflammatory bowel disease and Parkinson's disease comorbidity.
- Author
-
Kars ME, Wu Y, Stenson PD, Cooper DN, Burisch J, Peter I, and Itan Y
- Subjects
- Humans, Female, Male, Mutation, Missense, Genome-Wide Association Study, Genetic Variation, Middle Aged, Aged, Parkinson Disease genetics, Inflammatory Bowel Diseases genetics, Leucine-Rich Repeat Serine-Threonine Protein Kinase-2 genetics, Genetic Predisposition to Disease, Comorbidity
- Abstract
Background: Inflammatory bowel disease (IBD) and Parkinson's disease (PD) are chronic disorders that have been suggested to share common pathophysiological processes. LRRK2 has been implicated as playing a role in both diseases. Exploring the genetic basis of the IBD-PD comorbidity through studying high-impact rare genetic variants can facilitate the identification of the novel shared genetic factors underlying this comorbidity., Methods: We analyzed whole exomes from the BioMe BioBank and UK Biobank, and whole genomes from a cohort of 67 European patients diagnosed with both IBD and PD to examine the effects of LRRK2 missense variants on IBD, PD and their co-occurrence (IBD-PD). We performed optimized sequence kernel association test (SKAT-O) and network-based heterogeneity clustering (NHC) analyses using high-impact rare variants in the IBD-PD cohort to identify novel candidate genes, which we further prioritized by biological relatedness approaches. We conducted phenome-wide association studies (PheWAS) employing BioMe BioBank and UK Biobank whole exomes to estimate the genetic relevance of the 14 prioritized genes to IBD-PD., Results: The analysis of LRRK2 missense variants revealed significant associations of the G2019S and N2081D variants with IBD-PD in addition to several other variants as potential contributors to increased or decreased IBD-PD risk. SKAT-O identified two significant genes, LRRK2 and IL10RA, and NHC identified 6 significant gene clusters that are biologically relevant to IBD-PD. We observed prominent overlaps between the enriched pathways in the known IBD, PD, and candidate IBD-PD gene sets. Additionally, we detected significantly enriched pathways unique to the IBD-PD, including MAPK signaling, LPS/IL-1 mediated inhibition of RXR function, and NAD signaling. Fourteen final candidate IBD-PD genes were prioritized by biological relatedness methods. The biological importance scores estimated by protein-protein interaction networks and pathway and ontology enrichment analyses indicated the involvement of genes related to immunity, inflammation, and autophagy in IBD-PD. Additionally, PheWAS provided support for the associations of candidate genes with IBD and PD., Conclusions: Our study confirms and uncovers new LRRK2 associations in IBD-PD. The identification of novel inflammation and autophagy-related genes supports and expands previous findings related to IBD-PD pathogenesis, and underscores the significance of therapeutic interventions for reducing systemic inflammation., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
5. Genome-wide prediction of pathogenic gain- and loss-of-function variants from ensemble learning of a diverse feature set.
- Author
-
Stein D, Kars ME, Wu Y, Bayrak ÇS, Stenson PD, Cooper DN, Schlessinger A, and Itan Y
- Subjects
- Humans, Machine Learning, Genome, Proteins
- Abstract
Gain-of-function (GOF) variants give rise to increased/novel protein functions whereas loss-of-function (LOF) variants lead to diminished protein function. Experimental approaches for identifying GOF and LOF are generally slow and costly, whilst available computational methods have not been optimized to discriminate between GOF and LOF variants. We have developed LoGoFunc, a machine learning method for predicting pathogenic GOF, pathogenic LOF, and neutral genetic variants, trained on a broad range of gene-, protein-, and variant-level features describing diverse biological characteristics. LoGoFunc outperforms other tools trained solely to predict pathogenicity for identifying pathogenic GOF and LOF variants and is available at https://itanlab.shinyapps.io/goflof/ ., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
6. Genome-wide detection of human intronic AG-gain variants located between splicing branchpoints and canonical splice acceptor sites.
- Author
-
Zhang P, Chaldebas M, Ogishi M, Al Qureshah F, Ponsin K, Feng Y, Rinchai D, Milisavljevic B, Han JE, Moncada-Vélez M, Keles S, Schröder B, Stenson PD, Cooper DN, Cobat A, Boisson B, Zhang Q, Boisson-Dupuis S, Abel L, and Casanova JL
- Subjects
- Humans, Introns, Mutation, Genome, RNA Splice Sites, RNA Splicing
- Abstract
Human genetic variants that introduce an AG into the intronic region between the branchpoint (BP) and the canonical splice acceptor site (ACC) of protein-coding genes can disrupt pre-mRNA splicing. Using our genome-wide BP database, we delineated the BP-ACC segments of all human introns and found extreme depletion of AG/YAG in the [BP+8, ACC-4] high-risk region. We developed AGAIN as a genome-wide computational approach to systematically and precisely pinpoint intronic AG-gain variants within the BP-ACC regions. AGAIN identified 350 AG-gain variants from the Human Gene Mutation Database, all of which alter splicing and cause disease. Among them, 74% created new acceptor sites, whereas 31% resulted in complete exon skipping. AGAIN also predicts the protein-level products resulting from these two consequences. We performed AGAIN on our exome/genomes database of patients with severe infectious diseases but without known genetic etiology and identified a private homozygous intronic AG-gain variant in the antimycobacterial gene SPPL2A in a patient with mycobacterial disease. AGAIN also predicts a retention of six intronic nucleotides that encode an in-frame stop codon, turning AG-gain into stop-gain. This allele was then confirmed experimentally to lead to loss of function by disrupting splicing. We further showed that AG-gain variants inside the high-risk region led to misspliced products, while those outside the region did not, by two case studies in genes STAT1 and IRF7. We finally evaluated AGAIN on our 14 paired exome-RNAseq samples and found that 82% of AG-gain variants in high-risk regions showed evidence of missplicing. AGAIN is publicly available from https://hgidsoft.rockefeller.edu/AGAIN and https://github.com/casanova-lab/AGAIN.
- Published
- 2023
- Full Text
- View/download PDF
7. Phylogenomic analyses provide insights into primate evolution.
- Author
-
Shao Y, Zhou L, Li F, Zhao L, Zhang BL, Shao F, Chen JW, Chen CY, Bi X, Zhuang XL, Zhu HL, Hu J, Sun Z, Li X, Wang D, Rivas-González I, Wang S, Wang YM, Chen W, Li G, Lu HM, Liu Y, Kuderna LFK, Farh KK, Fan PF, Yu L, Li M, Liu ZJ, Tiley GP, Yoder AD, Roos C, Hayakawa T, Marques-Bonet T, Rogers J, Stenson PD, Cooper DN, Schierup MH, Yao YG, Zhang YP, Wang W, Qi XG, Zhang G, and Wu DD
- Subjects
- Animals, Humans, Genome, Genomics, Phylogeny, Gene Rearrangement, Brain anatomy & histology, Evolution, Molecular, Primates anatomy & histology, Primates classification, Primates genetics
- Abstract
Comparative analysis of primate genomes within a phylogenetic context is essential for understanding the evolution of human genetic architecture and primate diversity. We present such a study of 50 primate species spanning 38 genera and 14 families, including 27 genomes first reported here, with many from previously less well represented groups, the New World monkeys and the Strepsirrhini. Our analyses reveal heterogeneous rates of genomic rearrangement and gene evolution across primate lineages. Thousands of genes under positive selection in different lineages play roles in the nervous, skeletal, and digestive systems and may have contributed to primate innovations and adaptations. Our study reveals that many key genomic innovations occurred in the Simiiformes ancestral node and may have had an impact on the adaptive radiation of the Simiiformes and human evolution.
- Published
- 2023
- Full Text
- View/download PDF
8. Identifying high-impact variants and genes in exomes of Ashkenazi Jewish inflammatory bowel disease patients.
- Author
-
Wu Y, Gettler K, Kars ME, Giri M, Li D, Bayrak CS, Zhang P, Jain A, Maffucci P, Sabic K, Van Vleck T, Nadkarni G, Denson LA, Ostrer H, Levine AP, Schiff ER, Segal AW, Kugathasan S, Stenson PD, Cooper DN, Philip Schumm L, Snapper S, Daly MJ, Haritunians T, Duerr RH, Silverberg MS, Rioux JD, Brant SR, McGovern DPB, Cho JH, and Itan Y
- Subjects
- Adult, Humans, Exome genetics, Risk Assessment, Genetic Predisposition to Disease, Jews genetics, Inflammatory Bowel Diseases genetics
- Abstract
Inflammatory bowel disease (IBD) is a group of chronic digestive tract inflammatory conditions whose genetic etiology is still poorly understood. The incidence of IBD is particularly high among Ashkenazi Jews. Here, we identify 8 novel and plausible IBD-causing genes from the exomes of 4453 genetically identified Ashkenazi Jewish IBD cases (1734) and controls (2719). Various biological pathway analyses are performed, along with bulk and single-cell RNA sequencing, to demonstrate the likely physiological relatedness of the novel genes to IBD. Importantly, we demonstrate that the rare and high impact genetic architecture of Ashkenazi Jewish adult IBD displays significant overlap with very early onset-IBD genetics. Moreover, by performing biobank phenome-wide analyses, we find that IBD genes have pleiotropic effects that involve other immune responses. Finally, we show that polygenic risk score analyses based on genome-wide high impact variants have high power to predict IBD susceptibility., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
9. Profiling human pathogenic repeat expansion regions by synergistic and multi-level impacts on molecular connections.
- Author
-
Fan C, Chen K, Wang Y, Ball EV, Stenson PD, Mort M, Bacolla A, Kehrer-Sawatzki H, Tainer JA, Cooper DN, and Zhao H
- Subjects
- Humans, Introns genetics, RNA, Trinucleotide Repeat Expansion, DNA Repeat Expansion, DNA
- Abstract
Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5'UTRs and 5'genes but were not significantly different from controls in introns, 3'UTRs and 3'genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5'genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx ( http://biomed.nscc-gz.cn/zhaolab/geneprediction/# ) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels., (© 2022. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.)
- Published
- 2023
- Full Text
- View/download PDF
10. Identifying shared genetic factors underlying epilepsy and congenital heart disease in Europeans.
- Author
-
Wu Y, Bayrak CS, Dong B, He S, Stenson PD, Cooper DN, Itan Y, and Chen L
- Subjects
- Humans, European People, Genetic Association Studies, Phenotype, Heart Defects, Congenital genetics, Epilepsy epidemiology, Epilepsy genetics
- Abstract
Epilepsy (EP) and congenital heart disease (CHD) are two apparently unrelated diseases that nevertheless display substantial mutual comorbidity. Thus, while congenital heart defects are associated with an elevated risk of developing epilepsy, the incidence of epilepsy in CHD patients correlates with CHD severity. Although genetic determinants have been postulated to underlie the comorbidity of EP and CHD, the precise genetic etiology is unknown. We performed variant and gene association analyses on EP and CHD patients separately, using whole exomes of genetically identified Europeans from the UK Biobank and Mount Sinai BioMe Biobank. We prioritized biologically plausible candidate genes and investigated the enriched pathways and other identified comorbidities by biological proximity calculation, pathway analyses, and gene-level phenome-wide association studies. Our variant- and gene-level results point to the Voltage-Gated Calcium Channels (VGCC) pathway as being a unifying framework for EP and CHD comorbidity. Additionally, pathway-level analyses indicated that the functions of disease-associated genes partially overlap between the two disease entities. Finally, phenome-wide association analyses of prioritized candidate genes revealed that cerebral blood flow and ulcerative colitis constitute the two main traits associated with both EP and CHD., (© 2022. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.)
- Published
- 2023
- Full Text
- View/download PDF
11. Genome-wide detection of human variants that disrupt intronic branchpoints.
- Author
-
Zhang P, Philippot Q, Ren W, Lei WT, Li J, Stenson PD, Palacín PS, Colobran R, Boisson B, Zhang SY, Puel A, Pan-Hammarström Q, Zhang Q, Cooper DN, Abel L, and Casanova JL
- Subjects
- Humans, Introns genetics, Retrospective Studies, RNA Splicing genetics, Nucleotides, COVID-19 genetics
- Abstract
Pre-messenger RNA splicing is initiated with the recognition of a single-nucleotide intronic branchpoint (BP) within a BP motif by spliceosome elements. Forty-eight rare variants in 43 human genes have been reported to alter splicing and cause disease by disrupting BP. However, until now, no computational approach was available to efficiently detect such variants in massively parallel sequencing data. We established a comprehensive human genome-wide BP database by integrating existing BP data and generating new BP data from RNA sequencing of lariat debranching enzyme DBR1-mutated patients and from machine-learning predictions. We characterized multiple features of BP in major and minor introns and found that BP and BP-2 (two nucleotides upstream of BP) positions exhibit a lower rate of variation in human populations and higher evolutionary conservation than the intronic background, while being comparable to the exonic background. We developed BPHunter as a genome-wide computational approach to systematically and efficiently detect intronic variants that may disrupt BP recognition. BPHunter retrospectively identified 40 of the 48 known pathogenic BP variants, in which we summarized a strategy for prioritizing BP variant candidates. The remaining eight variants all create AG-dinucleotides between the BP and acceptor site, which is the likely reason for missplicing. We demonstrated the practical utility of BPHunter prospectively by using it to identify a novel germline heterozygous BP variant of STAT2 in a patient with critical COVID-19 pneumonia and a novel somatic intronic 59-nucleotide deletion of ITPKB in a lymphoma patient, both of which were validated experimentally. BPHunter is publicly available from https://hgidsoft.rockefeller.edu/BPHunter and https://github.com/casanova-lab/BPHunter.
- Published
- 2022
- Full Text
- View/download PDF
12. X-CAP improves pathogenicity prediction of stopgain variants.
- Author
-
Rastogi R, Stenson PD, Cooper DN, and Bejerano G
- Subjects
- Computational Biology methods, Humans, Mutation, Mutation, Missense, Virulence, Exome, Software
- Abstract
Stopgain substitutions are the third-largest class of monogenic human disease mutations and often examined first in patient exomes. Existing computational stopgain pathogenicity predictors, however, exhibit poor performance at the high sensitivity required for clinical use. Here, we introduce a new classifier, termed X-CAP, which uses a novel training methodology and unique feature set to improve the AUROC by 18% and decrease the false-positive rate 4-fold on large variant databases. In patient exomes, X-CAP prioritizes causal stopgains better than existing methods do, further illustrating its clinical utility. X-CAP is available at https://github.com/bejerano-lab/X-CAP ., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
13. Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity.
- Author
-
Quinodoz M, Peter VG, Cisarova K, Royer-Bertrand B, Stenson PD, Cooper DN, Unger S, Superti-Furga A, and Rivolta C
- Subjects
- Cluster Analysis, Exome genetics, Humans, Virulence, Genome, Human genetics, Mutation, Missense genetics
- Abstract
We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied to 840 genes from the ClinVar database, this approach detected a significant non-random distribution of pathogenic and benign variants in 387 (46%) and 172 (20%) genes, respectively, revealing that variant clustering is widespread across the human exome. This clustering likely occurs as a consequence of mechanisms shaping pathogenicity at the protein level, as illustrated by the overlap of some clusters with known functional domains. We then took advantage of these findings to develop a pathogenicity predictor, MutScore, that integrates qualitative features of DNA substitutions with the new additional information derived from this positional clustering. Using a random forest approach, MutScore was able to identify pathogenic missense mutations with very high accuracy, outperforming existing predictive tools, especially for variants associated with autosomal-dominant disease and cancer. Thus, the within-gene clustering of pathogenic and benign DNA changes is an important and previously underappreciated feature of the human exome, which can be harnessed to improve the prediction of pathogenicity and disambiguation of DNA variants of uncertain significance., Competing Interests: Declaration of interests D.N.C. and P.D.S. acknowledge QIAGEN Inc. for their financial support through a License Agreement with Cardiff University. The other authors do not declare any conflicts of interest., (Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
14. Distinct sequence features underlie microdeletions and gross deletions in the human genome.
- Author
-
Qi M, Stenson PD, Ball EV, Tainer JA, Bacolla A, Kehrer-Sawatzki H, Cooper DN, and Zhao H
- Subjects
- Base Composition, Base Sequence, Humans, Mutation, Sequence Deletion, DNA genetics, Genome, Human genetics
- Abstract
Microdeletions and gross deletions are important causes (~20%) of human inherited disease and their genomic locations are strongly influenced by the local DNA sequence environment. This notwithstanding, no study has systematically examined their underlying generative mechanisms. Here, we obtained 42,098 pathogenic microdeletions and gross deletions from the Human Gene Mutation Database (HGMD) that together form a continuum of germline deletions ranging in size from 1 to 28,394,429 bp. We analyzed the DNA sequence within 1 kb of the breakpoint junctions and found that the frequencies of non-B DNA-forming repeats, GC-content, and the presence of seven of 78 specific sequence motifs in the vicinity of pathogenic deletions correlated with deletion length for deletions of length ≤30 bp. Further, we found that the presence of DR, GQ, and STR repeats is important for the formation of longer deletions (>30 bp) but not for the formation of shorter deletions (≤30 bp) while significantly (χ
2 , p < 2E-16) more microhomologies were identified flanking short deletions than long deletions (length >30 bp). We provide evidence to support a functional distinction between microdeletions and gross deletions. Finally, we propose that a deletion length cut-off of 25-30 bp may serve as an objective means to functionally distinguish microdeletions from gross deletions., (© 2021 The Authors. Human Mutation published by Wiley Periodicals LLC.)- Published
- 2022
- Full Text
- View/download PDF
15. Identification of discriminative gene-level and protein-level features associated with pathogenic gain-of-function and loss-of-function variants.
- Author
-
Sevim Bayrak C, Stein D, Jain A, Chaudhary K, Nadkarni GN, Van Vleck TT, Puel A, Boisson-Dupuis S, Okada S, Stenson PD, Cooper DN, Schlessinger A, and Itan Y
- Subjects
- Cloud Computing, Genetic Predisposition to Disease, Genome, Human, Germ-Line Mutation, Humans, Internet-Based Intervention, Machine Learning, Databases, Genetic, Gain of Function Mutation, Loss of Function Mutation, Proteins genetics
- Abstract
Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline GOF and LOF pathogenic variants by employing natural language processing (NLP) on the available abstracts in the Human Gene Mutation Database. We then investigated various gene- and protein-level features of GOF and LOF variants and applied machine learning and statistical analyses to identify discriminative features. We found that GOF variants were enriched in essential genes, for autosomal-dominant inheritance, and in protein binding and interaction domains, whereas LOF variants were enriched in singleton genes, for protein-truncating variants, and in protein core regions. We developed a user-friendly web-based interface that enables the extraction of selected subsets from the GOF/LOF database by a broad set of annotated features and downloading of up-to-date versions. These results improve our understanding of how variants affect gene/protein function and may ultimately guide future treatment options., Competing Interests: Declaration of interests The authors declare no competing interests., (Copyright © 2021 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
16. The genetic structure of the Turkish population reveals high levels of variation and admixture.
- Author
-
Kars ME, Başak AN, Onat OE, Bilguvar K, Choi J, Itan Y, Çağlar C, Palvadeau R, Casanova JL, Cooper DN, Stenson PD, Yavuz A, Buluş H, Günel M, Friedman JM, and Özçelik T
- Subjects
- Alleles, Consanguinity, Exome, Gene Frequency genetics, Genetic Drift, Genetics, Population methods, Genome-Wide Association Study methods, Genotype, Haplotypes genetics, Human Migration trends, Humans, Turkey ethnology, Exome Sequencing methods, Genetic Variation genetics, Genome, Human genetics
- Abstract
The construction of population-based variomes has contributed substantially to our understanding of the genetic basis of human inherited disease. Here, we investigated the genetic structure of Turkey from 3,362 unrelated subjects whose whole exomes ( n = 2,589) or whole genomes ( n = 773) were sequenced to generate a Turkish (TR) Variome that should serve to facilitate disease gene discovery in Turkey. Consistent with the history of present-day Turkey as a crossroads between Europe and Asia, we found extensive admixture between Balkan, Caucasus, Middle Eastern, and European populations with a closer genetic relationship of the TR population to Europeans than hitherto appreciated. We determined that 50% of TR individuals had high inbreeding coefficients (≥0.0156) with runs of homozygosity longer than 4 Mb being found exclusively in the TR population when compared to 1000 Genomes Project populations. We also found that 28% of exome and 49% of genome variants in the very rare range (allele frequency < 0.005) are unique to the modern TR population. We annotated these variants based on their functional consequences to establish a TR Variome containing alleles of potential medical relevance, a repository of homozygous loss-of-function variants and a TR reference panel for genotype imputation using high-quality haplotypes, to facilitate genome-wide association studies. In addition to providing information on the genetic structure of the modern TR population, these data provide an invaluable resource for future studies to identify variants that are associated with specific phenotypes as well as establishing the phenotypic consequences of mutations in specific genes., Competing Interests: The authors declare no competing interest.
- Published
- 2021
- Full Text
- View/download PDF
17. Compensatory epistasis explored by molecular dynamics simulations.
- Author
-
Serrano C, Teixeira CSS, Cooper DN, Carneiro J, Lopes-Marques M, Stenson PD, Amorim A, Prata MJ, Sousa SF, and Azevedo L
- Subjects
- Amino Acid Substitution, Humans, Chromosomes, Human, X genetics, Epistasis, Genetic, Factor IXa chemistry, Factor IXa genetics, Molecular Dynamics Simulation, Mutation, Missense
- Abstract
A non-negligible proportion of human pathogenic variants are known to be present as wild type in at least some non-human mammalian species. The standard explanation for this finding is that molecular mechanisms of compensatory epistasis can alleviate the mutations' otherwise pathogenic effects. Examples of compensated variants have been described in the literature but the interacting residue(s) postulated to play a compensatory role have rarely been ascertained. In this study, the examination of five human X-chromosomally encoded proteins (FIX, GLA, HPRT1, NDP and OTC) allowed us to identify several candidate compensated variants. Strong evidence for a compensated/compensatory pair of amino acids in the coagulation FIXa protein (involving residues 270 and 271) was found in a variety of mammalian species. Both amino acid residues are located within the 60-loop, spatially close to the 39-loop that performs a key role in coagulation serine proteases. To understand the nature of the underlying interactions, molecular dynamics simulations were performed. The predicted conformational change in the 39-loop consequent to the Glu270Lys substitution (associated with hemophilia B) appears to impair the protein's interaction with its substrate but, importantly, such steric hindrance is largely mitigated in those proteins that carry the compensatory residue (Pro271) at the neighboring amino acid position., (© 2021. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.)
- Published
- 2021
- Full Text
- View/download PDF
18. The Human Gene Mutation Database (HGMD ® ): optimizing its use in a clinical diagnostic or research setting.
- Author
-
Stenson PD, Mort M, Ball EV, Chapman M, Evans K, Azevedo L, Hayden M, Heywood S, Millar DS, Phillips AD, and Cooper DN
- Subjects
- Bibliometrics, Biomedical Research methods, Genetic Predisposition to Disease, Humans, Public-Private Sector Partnerships, Databases, Genetic, Genome, Human, Germ-Line Mutation, Polymorphism, Genetic
- Abstract
The Human Gene Mutation Database (HGMD
® ) constitutes a comprehensive collection of published germline mutations in nuclear genes that are thought to underlie, or are closely associated with human inherited disease. At the time of writing (June 2020), the database contains in excess of 289,000 different gene lesions identified in over 11,100 genes manually curated from 72,987 articles published in over 3100 peer-reviewed journals. There are primarily two main groups of users who utilise HGMD on a regular basis; research scientists and clinical diagnosticians. This review aims to highlight how to make the most out of HGMD data in each setting.- Published
- 2020
- Full Text
- View/download PDF
19. Common homozygosity for predicted loss-of-function variants reveals both redundant and advantageous effects of dispensable human genes.
- Author
-
Rausell A, Luo Y, Lopez M, Seeleuthner Y, Rapaport F, Favier A, Stenson PD, Cooper DN, Patin E, Casanova JL, Quintana-Murci L, and Abel L
- Subjects
- Alleles, Apolipoproteins L genetics, Fucosyltransferases genetics, Genetic Variation, Homozygote, Humans, Proteins genetics, Sex Chromosomes genetics, Galactoside 2-alpha-L-fucosyltransferase, Human Genetics, Loss of Function Mutation
- Abstract
Humans homozygous or hemizygous for variants predicted to cause a loss of function (LoF) of the corresponding protein do not necessarily present with overt clinical phenotypes. We report here 190 autosomal genes with 207 predicted LoF variants, for which the frequency of homozygous individuals exceeds 1% in at least one human population from five major ancestry groups. No such genes were identified on the X and Y chromosomes. Manual curation revealed that 28 variants (15%) had been misannotated as LoF. Of the 179 remaining variants in 166 genes, only 11 alleles in 11 genes had previously been confirmed experimentally to be LoF. The set of 166 dispensable genes was enriched in olfactory receptor genes (41 genes). The 41 dispensable olfactory receptor genes displayed a relaxation of selective constraints similar to that observed for other olfactory receptor genes. The 125 dispensable nonolfactory receptor genes also displayed a relaxation of selective constraints consistent with greater redundancy. Sixty-two of these 125 genes were found to be dispensable in at least three human populations, suggesting possible evolution toward pseudogenes. Of the 179 LoF variants, 68 could be tested for two neutrality statistics, and 8 displayed robust signals of positive selection. These latter variants included a known FUT2 variant that confers resistance to intestinal viruses, and an APOL3 variant involved in resistance to parasitic infections. Overall, the identification of 166 genes for which a sizeable proportion of humans are homozygous for predicted LoF alleles reveals both redundancies and advantages of such deficiencies for human survival., Competing Interests: Competing interest statement: L.H. coauthored research papers with J.-L.C. in 2017 and with E.P., J.-L.C., L.Q.-M., and L.A. in 2018.
- Published
- 2020
- Full Text
- View/download PDF
20. AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature.
- Author
-
Birgmeier J, Haeussler M, Deisseroth CA, Steinberg EH, Jagadeesh KA, Ratner AJ, Guturu H, Wenger AM, Diekhans ME, Stenson PD, Cooper DN, Ré C, Beggs AH, Bernstein JA, and Bejerano G
- Subjects
- Child, Genotype, Humans, Phenotype, Probability, Retrospective Studies, Exome
- Abstract
The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient's disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient's given set of phenotypes. Diagnosis of singleton patients (without relatives' exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database-based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children's Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu., (Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.)
- Published
- 2020
- Full Text
- View/download PDF
21. AVADA: toward automated pathogenic variant evidence retrieval directly from the full-text literature.
- Author
-
Birgmeier J, Deisseroth CA, Hayward LE, Galhardo LMT, Tierno AP, Jagadeesh KA, Stenson PD, Cooper DN, Bernstein JA, Haeussler M, and Bejerano G
- Subjects
- Data Management methods, Databases, Factual, Databases, Genetic, Humans, Natural Language Processing, PubMed, Publications, Electronic Data Processing methods, Genomics methods, Information Storage and Retrieval methods
- Abstract
Purpose: Both monogenic pathogenic variant cataloging and clinical patient diagnosis start with variant-level evidence retrieval followed by expert evidence integration in search of diagnostic variants and genes. Here, we try to accelerate pathogenic variant evidence retrieval by an automatic approach., Methods: Automatic VAriant evidence DAtabase (AVADA) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic genetic variant evidence in full-text primary literature about monogenic disease and convert it to genomic coordinates., Results: AVADA automatically retrieved almost 60% of likely disease-causing variants deposited in the Human Gene Mutation Database (HGMD), a 4.4-fold improvement over the current best open source automated variant extractor. AVADA contains over 60,000 likely disease-causing variants that are in HGMD but not in ClinVar. AVADA also highlights the challenges of automated variant mapping and pathogenicity curation. However, when combined with manual validation, on 245 diagnosed patients, AVADA provides valuable evidence for an additional 18 diagnostic variants, on top of ClinVar's 21, versus only 2 using the best current automated approach., Conclusion: AVADA advances automated retrieval of pathogenic monogenic variant evidence from full-text literature. Far from perfect, but much faster than PubMed/Google Scholar search, careful curation of AVADA-retrieved evidence can aid both database curation and patient diagnosis.
- Published
- 2020
- Full Text
- View/download PDF
22. Extensive disruption of protein interactions by genetic variants across the allele frequency spectrum in human populations.
- Author
-
Fragoza R, Das J, Wierbowski SD, Liang J, Tran TN, Liang S, Beltran JF, Rivera-Erick CA, Ye K, Wang TY, Yao L, Mort M, Stenson PD, Cooper DN, Wei X, Keinan A, Schimenti JC, Clark AG, and Yu H
- Subjects
- Alleles, Animals, Base Sequence, Disease genetics, Genetic Predisposition to Disease, Genome, Human, HEK293 Cells, Humans, Mice, Mutation, Missense genetics, Phenotype, Polymorphism, Single Nucleotide genetics, Protein Binding genetics, Gene Frequency genetics, Genetic Variation, Genetics, Population
- Abstract
Each human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leverage the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2009 missense single nucleotide variants (SNVs) across 2185 protein-protein interactions, generating interaction profiles for 4797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations. We find that interaction-disruptive SNVs are prevalent at both rare and common allele frequencies. Furthermore, these results suggest that 10.5% of missense variants carried per individual are disruptive, a higher proportion than previously reported; this indicates that each individual's genetic makeup may be significantly more complex than expected. Finally, we demonstrate that candidate disease-associated mutations can be identified through shared interaction perturbations between variants of interest and known disease mutations.
- Published
- 2019
- Full Text
- View/download PDF
23. SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data.
- Author
-
Zhang P, Boisson B, Stenson PD, Cooper DN, Casanova JL, Abel L, and Itan Y
- Subjects
- Animals, Cattle, Genetic Variation, Humans, INDEL Mutation, Internet, Mice, Rats, High-Throughput Nucleotide Sequencing methods, Sequence Analysis, DNA methods, Sequence Analysis, Protein methods, Software
- Abstract
Human whole-genome-sequencing reveals about 4 000 000 genomic variants per individual. These data are mostly stored as VCF-format files. Although many variant analysis methods accept VCF as input, many other tools require DNA or protein sequences, particularly for splicing prediction, sequence alignment, phylogenetic analysis, and structure prediction. However, there is no existing webserver capable of extracting DNA/protein sequences for genomic variants from VCF files in a user-friendly and efficient manner. We developed the SeqTailor webserver to bridge this gap, by enabling rapid extraction of (i) DNA sequences around genomic variants, with customizable window sizes and options to annotate the splice sites closest to the variants and to consider the neighboring variants within the window; and (ii) protein sequences encoded by the DNA sequences around genomic variants, with built-in SnpEff annotator and customizable window sizes. SeqTailor supports 11 species, including: human (GRCh37/GRCh38), chimpanzee, mouse, rat, cow, chicken, lizard, zebrafish, fruitfly, Arabidopsis and rice. Standalone programs are provided for command-line-based needs. SeqTailor streamlines the sequence extraction process, and accelerates the analysis of genomic variants with software requiring DNA/protein sequences. It will facilitate the study of genomic variation, by increasing the feasibility of sequence-based analysis and prediction. The SeqTailor webserver is freely available at http://shiva.rockefeller.edu/SeqTailor/., (© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2019
- Full Text
- View/download PDF
24. S-CAP extends pathogenicity prediction to genetic variants that affect RNA splicing.
- Author
-
Jagadeesh KA, Paggi JM, Ye JS, Stenson PD, Cooper DN, Bernstein JA, and Bejerano G
- Subjects
- Exome genetics, Humans, Mutation genetics, Genetic Variation genetics, RNA Splicing genetics
- Abstract
Exome analysis of patients with a likely monogenic disease does not identify a causal variant in over half of cases. Splice-disrupting mutations make up the second largest class of known disease-causing mutations. Each individual (singleton) exome harbors over 500 rare variants of unknown significance (VUS) in the splicing region. The existing relevant pathogenicity prediction tools tackle all non-coding variants as one amorphic class and/or are not calibrated for the high sensitivity required for clinical use. Here we calibrate seven such tools and devise a novel tool called Splicing Clinically Applicable Pathogenicity prediction (S-CAP) that is over twice as powerful as all previous tools, removing 41% of patient VUS at 95% sensitivity. We show that S-CAP does this by using its own features and not via meta-prediction over previous tools, and that splicing pathogenicity prediction is distinct from predicting molecular splicing changes. S-CAP is an important step on the path to deriving non-coding causal diagnoses.
- Published
- 2019
- Full Text
- View/download PDF
25. Blacklisting variants common in private cohorts but not in public databases optimizes human exome analysis.
- Author
-
Maffucci P, Bigio B, Rapaport F, Cobat A, Borghesi A, Lopez M, Patin E, Bolze A, Shang L, Bendavid M, Scott EM, Stenson PD, Cunningham-Rundles C, Cooper DN, Gleeson JG, Fellay J, Quintana-Murci L, Casanova JL, Abel L, Boisson B, and Itan Y
- Subjects
- Cohort Studies, Female, Humans, Male, Databases, Nucleic Acid, Exome, Genetic Variation, Genome, Human, Sequence Analysis, DNA, Software
- Abstract
Computational analyses of human patient exomes aim to filter out as many nonpathogenic genetic variants (NPVs) as possible, without removing the true disease-causing mutations. This involves comparing the patient's exome with public databases to remove reported variants inconsistent with disease prevalence, mode of inheritance, or clinical penetrance. However, variants frequent in a given exome cohort, but absent or rare in public databases, have also been reported and treated as NPVs, without rigorous exploration. We report the generation of a blacklist of variants frequent within an in-house cohort of 3,104 exomes. This blacklist did not remove known pathogenic mutations from the exomes of 129 patients and decreased the number of NPVs remaining in the 3,104 individual exomes by a median of 62%. We validated this approach by testing three other independent cohorts of 400, 902, and 3,869 exomes. The blacklist generated from any given cohort removed a substantial proportion of NPVs (11-65%). We analyzed the blacklisted variants computationally and experimentally. Most of the blacklisted variants corresponded to false signals generated by incomplete reference genome assembly, location in low-complexity regions, bioinformatic misprocessing, or limitations inherent to cohort-specific private alleles (e.g., due to sequencing kits, and genetic ancestries). Finally, we provide our precalculated blacklists, together with ReFiNE, a program for generating customized blacklists from any medium-sized or large in-house cohort of exome (or other next-generation sequencing) data via a user-friendly public web server. This work demonstrates the power of extracting variant blacklists from private databases as a specific in-house but broadly applicable tool for optimizing exome analysis., Competing Interests: Conflict of interest statement: A.T. has coauthored multiple papers with J.F. and J.G.G. M.W. coauthored a 2017 paper with J.G.G.
- Published
- 2019
- Full Text
- View/download PDF
26. CDG: An Online Server for Detecting Biologically Closest Disease-Causing Genes and its Application to Primary Immunodeficiency.
- Author
-
Requena D, Maffucci P, Bigio B, Shang L, Abhyankar A, Boisson B, Stenson PD, Cooper DN, Cunningham-Rundles C, Casanova JL, Abel L, and Itan Y
- Abstract
High-throughput genomic technologies yield about 20,000 variants in the protein-coding exome of each individual. A commonly used approach to select candidate disease-causing variants is to test whether the associated gene has been previously reported to be disease-causing. In the absence of known disease-causing genes, it can be challenging to associate candidate genes with specific genetic diseases. To facilitate the discovery of novel gene-disease associations, we determined the putative biologically closest known genes and their associated diseases for 13,005 human genes not currently reported to be disease-associated. We used these data to construct the closest disease-causing genes (CDG) server, which can be used to infer the closest genes with an associated disease for a user-defined list of genes or diseases. We demonstrate the utility of the CDG server in five immunodeficiency patient exomes across different diseases and modes of inheritance, where CDG dramatically reduced the number of candidate genes to be evaluated. This resource will be a considerable asset for ascertaining the potential relevance of genetic variants found in patient exomes to specific diseases of interest. The CDG database and online server are freely available to non-commercial users at: http://lab.rockefeller.edu/casanova/CDG.
- Published
- 2018
- Full Text
- View/download PDF
27. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies.
- Author
-
Stenson PD, Mort M, Ball EV, Evans K, Hayden M, Heywood S, Hussain M, Phillips AD, and Cooper DN
- Subjects
- Humans, Molecular Diagnostic Techniques, Databases, Genetic, Mutation
- Abstract
The Human Gene Mutation Database (HGMD
® ) constitutes a comprehensive collection of published germline mutations in nuclear genes that underlie, or are closely associated with human inherited disease. At the time of writing (March 2017), the database contained in excess of 203,000 different gene lesions identified in over 8000 genes manually curated from over 2600 journals. With new mutation entries currently accumulating at a rate exceeding 17,000 per annum, HGMD represents de facto the central unified gene/disease-oriented repository of heritable mutations causing human genetic disease used worldwide by researchers, clinicians, diagnostic laboratories and genetic counsellors, and is an essential tool for the annotation of next-generation sequencing data. The public version of HGMD ( http://www.hgmd.org ) is freely available to registered users from academic institutions and non-profit organisations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via QIAGEN Inc.- Published
- 2017
- Full Text
- View/download PDF
28. iRegNet3D: three-dimensional integrated regulatory network for the genomic analysis of coding and non-coding disease mutations.
- Author
-
Liang S, Tippens ND, Zhou Y, Mort M, Stenson PD, Cooper DN, and Yu H
- Subjects
- Binding Sites, Chromatin genetics, Chromatin metabolism, DNA chemistry, DNA metabolism, DNA-Binding Proteins chemistry, DNA-Binding Proteins metabolism, Epistasis, Genetic, Gene Expression Regulation, Humans, Nucleotide Motifs, Open Reading Frames, Transcription Factors chemistry, Transcription Factors metabolism, Untranslated Regions, Gene Regulatory Networks, Genetic Predisposition to Disease, Genome-Wide Association Study methods, Genomics methods, Models, Molecular, Mutation, Quantitative Structure-Activity Relationship
- Abstract
The mechanistic details of most disease-causing mutations remain poorly explored within the context of regulatory networks. We present a high-resolution three-dimensional integrated regulatory network (iRegNet3D) in the form of a web tool, where we resolve the interfaces of all known transcription factor (TF)-TF, TF-DNA and chromatin-chromatin interactions for the analysis of both coding and non-coding disease-associated mutations to obtain mechanistic insights into their functional impact. Using iRegNet3D, we find that disease-associated mutations may perturb the regulatory network through diverse mechanisms including chromatin looping. iRegNet3D promises to be an indispensable tool in large-scale sequencing and disease association studies.
- Published
- 2017
- Full Text
- View/download PDF
29. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity.
- Author
-
Jagadeesh KA, Wenger AM, Berger MJ, Guturu H, Stenson PD, Cooper DN, Bernstein JA, and Bejerano G
- Subjects
- DNA Mutational Analysis methods, Humans, Predictive Value of Tests, Computational Biology methods, Disease genetics, Exome genetics, Genetic Markers genetics, Genetic Predisposition to Disease, Mutation genetics, Software
- Abstract
Variant pathogenicity classifiers such as SIFT, PolyPhen-2, CADD, and MetaLR assist in interpretation of the hundreds of rare, missense variants in the typical patient genome by deprioritizing some variants as likely benign. These widely used methods misclassify 26 to 38% of known pathogenic mutations, which could lead to missed diagnoses if the classifiers are trusted as definitive in a clinical setting. We developed M-CAP, a clinical pathogenicity classifier that outperforms existing methods at all thresholds and correctly dismisses 60% of rare, missense variants of uncertain significance in a typical genome at 95% sensitivity.
- Published
- 2016
- Full Text
- View/download PDF
30. Analysis of protein-coding genetic variation in 60,706 humans.
- Author
-
Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won HH, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, and MacArthur DG
- Subjects
- DNA Mutational Analysis, Datasets as Topic, Humans, Phenotype, Proteome genetics, Rare Diseases genetics, Sample Size, Exome genetics, Genetic Variation genetics
- Abstract
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
- Published
- 2016
- Full Text
- View/download PDF
31. mutation3D: Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome.
- Author
-
Meyer MJ, Lapcevic R, Romero AE, Yoon M, Das J, Beltrán JF, Mort M, Stenson PD, Cooper DN, Paccanaro A, and Yu H
- Subjects
- Algorithms, Cluster Analysis, Genetic Predisposition to Disease, Genome-Wide Association Study, Humans, Protein Structure, Tertiary, Proteome chemistry, Amino Acid Substitution, Neoplasms genetics, Proteome genetics, Web Browser
- Abstract
A new algorithm and Web server, mutation3D (http://mutation3d.org), proposes driver genes in cancer by identifying clusters of amino acid substitutions within tertiary protein structures. We demonstrate the feasibility of using a 3D clustering approach to implicate proteins in cancer based on explorations of single proteins using the mutation3D Web interface. On a large scale, we show that clustering with mutation3D is able to separate functional from nonfunctional mutations by analyzing a combination of 8,869 known inherited disease mutations and 2,004 SNPs overlaid together upon the same sets of crystal structures and homology models. Further, we present a systematic analysis of whole-genome and whole-exome cancer datasets to demonstrate that mutation3D identifies many known cancer genes as well as previously underexplored target genes. The mutation3D Web interface allows users to analyze their own mutation data in a variety of popular formats and provides seamless access to explore mutation clusters derived from over 975,000 somatic mutations reported by 6,811 cancer sequencing studies. The mutation3D Web interface is freely available with all major browsers supported., (© 2016 WILEY PERIODICALS, INC.)
- Published
- 2016
- Full Text
- View/download PDF
32. The mutation significance cutoff: gene-level thresholds for variant predictions.
- Author
-
Itan Y, Shang L, Boisson B, Ciancanelli MJ, Markle JG, Martinez-Barricarte R, Scott E, Shah I, Stenson PD, Gleeson J, Cooper DN, Quintana-Murci L, Zhang SY, Abel L, and Casanova JL
- Subjects
- Animals, Genetic Predisposition to Disease, Humans, Genetic Variation, Nucleic Acid Amplification Techniques methods
- Published
- 2016
- Full Text
- View/download PDF
33. Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-Indel).
- Author
-
Douville C, Masica DL, Stenson PD, Cooper DN, Gygax DM, Kim R, Ryan M, and Karchin R
- Subjects
- Algorithms, Datasets as Topic, Humans, Models, Genetic, Mutation, Missense, Reproducibility of Results, Web Browser, Computational Biology methods, INDEL Mutation, Software
- Abstract
Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features--DNA and protein sequence conservation, indel length, and occurrence in repeat regions--are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new "PubMed" feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method., (© 2015 The Authors. **Human Mutation published by Wiley Periodicals, Inc.)
- Published
- 2016
- Full Text
- View/download PDF
34. The human gene damage index as a gene-level approach to prioritizing exome variants.
- Author
-
Itan Y, Shang L, Boisson B, Patin E, Bolze A, Moncada-Vélez M, Scott E, Ciancanelli MJ, Lafaille FG, Markle JG, Martinez-Barricarte R, de Jong SJ, Kong XF, Nitschke P, Belkadi A, Bustamante J, Puel A, Boisson-Dupuis S, Stenson PD, Gleeson JG, Cooper DN, Quintana-Murci L, Claverie JM, Zhang SY, Abel L, and Casanova JL
- Subjects
- Humans, ROC Curve, Exome, Genetic Diseases, Inborn genetics
- Abstract
The protein-coding exome of a patient with a monogenic disease contains about 20,000 variants, only one or two of which are disease causing. We found that 58% of rare variants in the protein-coding exome of the general population are located in only 2% of the genes. Prompted by this observation, we aimed to develop a gene-level approach for predicting whether a given human protein-coding gene is likely to harbor disease-causing mutations. To this end, we derived the gene damage index (GDI): a genome-wide, gene-level metric of the mutational damage that has accumulated in the general population. We found that the GDI was correlated with selective evolutionary pressure, protein complexity, coding sequence length, and the number of paralogs. We compared GDI with the leading gene-level approaches, genic intolerance, and de novo excess, and demonstrated that GDI performed best for the detection of false positives (i.e., removing exome variants in genes irrelevant to disease), whereas genic intolerance and de novo excess performed better for the detection of true positives (i.e., assessing de novo mutations in genes likely to be disease causing). The GDI server, data, and software are freely available to noncommercial users from lab.rockefeller.edu/casanova/GDI.
- Published
- 2015
- Full Text
- View/download PDF
35. Proteins linked to autosomal dominant and autosomal recessive disorders harbor characteristic rare missense mutation distribution patterns.
- Author
-
Turner TN, Douville C, Kim D, Stenson PD, Cooper DN, Chakravarti A, and Karchin R
- Subjects
- Computational Biology, Databases, Genetic, Genome, Human, Humans, Molecular Sequence Annotation, Multigene Family, Genes, Dominant, Genes, Recessive, Genetic Diseases, Inborn genetics, Mutation, Missense, Proteins genetics
- Abstract
The role of rare missense variants in disease causation remains difficult to interpret. We explore whether the clustering pattern of rare missense variants (MAF < 0.01) in a protein is associated with mode of inheritance. Mutations in genes associated with autosomal dominant (AD) conditions are known to result in either loss or gain of function, whereas mutations in genes associated with autosomal recessive (AR) conditions invariably result in loss-of-function. Loss-of-function mutations tend to be distributed uniformly along protein sequence, whereas gain-of-function mutations tend to localize to key regions. It has not previously been ascertained whether these patterns hold in general for rare missense mutations. We consider the extent to which rare missense variants are located within annotated protein domains and whether they form clusters, using a new unbiased method called CLUstering by Mutation Position. These approaches quantified a significant difference in clustering between AD and AR diseases. Proteins linked to AD diseases exhibited more clustering of rare missense mutations than those linked to AR diseases (Wilcoxon P = 5.7 × 10(-4), permutation P = 8.4 × 10(-4)). Rare missense mutation in proteins linked to either AD or AR diseases was more clustered than controls (1000G) (Wilcoxon P = 2.8 × 10(-15) for AD and P = 4.5 × 10(-4) for AR, permutation P = 3.1 × 10(-12) for AD and P = 0.03 for AR). The differences in clustering patterns persisted even after removal of the most prominent genes. Testing for such non-random patterns may reveal novel aspects of disease etiology in large sample studies., (© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
- Published
- 2015
- Full Text
- View/download PDF
36. Identification of cancer predisposition variants in apparently healthy individuals using a next-generation sequencing-based family genomics approach.
- Author
-
Karageorgos I, Mizzi C, Giannopoulou E, Pavlidis C, Peters BA, Zagoriti Z, Stenson PD, Mitropoulos K, Borg J, Kalofonos HP, Drmanac R, Stubbs A, van der Spek P, Cooper DN, Katsila T, and Patrinos GP
- Subjects
- BRCA1 Protein genetics, DNA-Binding Proteins genetics, Humans, MutS Homolog 2 Protein genetics, Mutation, Neoplasm Proteins genetics, Neoplasms pathology, Pedigree, Polymorphism, Single Nucleotide, Genetic Predisposition to Disease, Genomics methods, High-Throughput Nucleotide Sequencing methods, Neoplasms genetics
- Abstract
Cancer, like many common disorders, has a complex etiology, often with a strong genetic component and with multiple environmental factors contributing to susceptibility. A considerable number of genomic variants have been previously reported to be causative of, or associated with, an increased risk for various types of cancer. Here, we adopted a next-generation sequencing approach in 11 members of two families of Greek descent to identify all genomic variants with the potential to predispose family members to cancer. Cross-comparison with data from the Human Gene Mutation Database identified a total of 571 variants, from which 47 % were disease-associated polymorphisms, 26 % disease-associated polymorphisms with additional supporting functional evidence, 19 % functional polymorphisms with in vitro/laboratory or in vivo supporting evidence but no known disease association, 4 % putative disease-causing mutations but with some residual doubt as to their pathological significance, and 3 % disease-causing mutations. Subsequent analysis, focused on the latter variant class most likely to be involved in cancer predisposition, revealed two variants of prime interest, namely MSH2 c.2732T>A (p.L911R) and BRCA1 c.2955delC, the first of which is novel. KMT2D c.13895delC and c.1940C>A variants are additionally reported as incidental findings. The next-generation sequencing-based family genomics approach described herein has the potential to be applied to other types of complex genetic disorder in order to identify variants of potential pathological significance.
- Published
- 2015
- Full Text
- View/download PDF
37. Individualized iterative phenotyping for genome-wide analysis of loss-of-function mutations.
- Author
-
Johnston JJ, Lewis KL, Ng D, Singh LN, Wynter J, Brewer C, Brooks BP, Brownell I, Candotti F, Gonsalves SG, Hart SP, Kong HH, Rother KI, Sokolic R, Solomon BD, Zein WM, Cooper DN, Stenson PD, Mullikin JC, and Biesecker LG
- Subjects
- Computational Biology, Exome genetics, Female, Genome-Wide Association Study trends, Humans, Male, Middle Aged, Atherosclerosis genetics, Genome-Wide Association Study methods, High-Throughput Nucleotide Sequencing methods, Mutation genetics, Phenotype, Precision Medicine methods
- Abstract
Next-generation sequencing provides the opportunity to practice predictive medicine based on identified variants. Putative loss-of-function (pLOF) variants are common in genomes and understanding their contribution to disease is critical for predictive medicine. To this end, we characterized the consequences of pLOF variants in an exome cohort by iterative phenotyping. Exome data were generated on 951 participants from the ClinSeq cohort and filtered for pLOF variants in genes likely to cause a phenotype in heterozygotes. 103 of 951 exomes had such a pLOF variant and 79 participants were evaluated. Of those 79, 34 had findings or family histories that could be attributed to the variant (28 variants in 18 genes), 2 had indeterminate findings (2 variants in 2 genes), and 43 had no findings or a negative family history for the trait (34 variants in 28 genes). The presence of a phenotype was correlated with two mutation attributes: prior report of pathogenicity for the variant (p = 0.0001) and prior report of other mutations in the same exon (p = 0.0001). We conclude that 1/30 unselected individuals harbor a pLOF mutation associated with a phenotype either in themselves or their family. This is more common than has been assumed and has implications for the setting of prior probabilities of affection status for predictive medicine., (Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.)
- Published
- 2015
- Full Text
- View/download PDF
38. Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome.
- Author
-
Rivas MA, Pirinen M, Conrad DF, Lek M, Tsang EK, Karczewski KJ, Maller JB, Kukurba KR, DeLuca DS, Fromer M, Ferreira PG, Smith KS, Zhang R, Zhao F, Banks E, Poplin R, Ruderfer DM, Purcell SM, Tukiainen T, Minikel EV, Stenson PD, Cooper DN, Huang KH, Sullivan TJ, Nedzel J, Bustamante CD, Li JB, Daly MJ, Guigo R, Donnelly P, Ardlie K, Sammeth M, Dermitzakis ET, McCarthy MI, Montgomery SB, Lappalainen T, and MacArthur DG
- Subjects
- Alternative Splicing, Gene Expression Profiling, Gene Silencing, Heterozygote, Humans, Nonsense Mediated mRNA Decay, Phenotype, Gene Expression Regulation, Genetic Variation, Genome, Human genetics, Proteins genetics, Transcriptome
- Abstract
Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants., (Copyright © 2015, American Association for the Advancement of Science.)
- Published
- 2015
- Full Text
- View/download PDF
39. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity.
- Author
-
Grimm DG, Azencott CA, Aicheler F, Gieraths U, MacArthur DG, Samocha KE, Cooper DN, Stenson PD, Daly MJ, Smoller JW, Duncan LE, and Borgwardt KM
- Subjects
- Datasets as Topic, Humans, Internet, Reproducibility of Results, Web Browser, Computational Biology methods, Mutation, Missense, Software
- Abstract
Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools., (© 2015 The Authors. **Human Mutation published by Wiley Periodicals, Inc.)
- Published
- 2015
- Full Text
- View/download PDF
40. A massively parallel pipeline to clone DNA variants and examine molecular phenotypes of human disease mutations.
- Author
-
Wei X, Das J, Fragoza R, Liang J, Bastos de Oliveira FM, Lee HR, Wang X, Mort M, Stenson PD, Cooper DN, Lipkin SM, Smolka MB, and Yu H
- Subjects
- Adaptor Proteins, Signal Transducing genetics, Adaptor Proteins, Signal Transducing metabolism, Alleles, Chromatography, Liquid, Exome, Gene Expression Regulation, Gene Library, HEK293 Cells, High-Throughput Nucleotide Sequencing, Humans, MutL Protein Homolog 1, Nuclear Proteins genetics, Nuclear Proteins metabolism, Plasmids genetics, Protein Interaction Domains and Motifs, Protein Stability, Saccharomyces cerevisiae genetics, Tandem Mass Spectrometry, Cloning, Molecular methods, DNA Copy Number Variations, DNA Mutational Analysis methods, Mutagenesis, Site-Directed, Mutation, Phenotype
- Abstract
Understanding the functional relevance of DNA variants is essential for all exome and genome sequencing projects. However, current mutagenesis cloning protocols require Sanger sequencing, and thus are prohibitively costly and labor-intensive. We describe a massively-parallel site-directed mutagenesis approach, "Clone-seq", leveraging next-generation sequencing to rapidly and cost-effectively generate a large number of mutant alleles. Using Clone-seq, we further develop a comparative interactome-scanning pipeline integrating high-throughput GFP, yeast two-hybrid (Y2H), and mass spectrometry assays to systematically evaluate the functional impact of mutations on protein stability and interactions. We use this pipeline to show that disease mutations on protein-protein interaction interfaces are significantly more likely than those away from interfaces to disrupt corresponding interactions. We also find that mutation pairs with similar molecular phenotypes in terms of both protein stability and interactions are significantly more likely to cause the same disease than those with different molecular phenotypes, validating the in vivo biological relevance of our high-throughput GFP and Y2H assays, and indicating that both assays can be used to determine candidate disease mutations in the future. The general scheme of our experimental pipeline can be readily expanded to other types of interactome-mapping methods to comprehensively evaluate the functional relevance of all DNA variants, including those in non-coding regions.
- Published
- 2014
- Full Text
- View/download PDF
41. A probabilistic model to predict clinical phenotypic traits from genome sequencing.
- Author
-
Chen YC, Douville C, Wang C, Niknafs N, Yeo G, Beleva-Guthrie V, Carter H, Stenson PD, Cooper DN, Li B, Mooney S, and Karchin R
- Subjects
- Bayes Theorem, Genome-Wide Association Study, Human Genome Project, Humans, Phenotype, Genetic Predisposition to Disease genetics, Genome genetics, Genomics methods, Models, Statistical, Sequence Analysis, DNA methods
- Abstract
Genetic screening is becoming possible on an unprecedented scale. However, its utility remains controversial. Although most variant genotypes cannot be easily interpreted, many individuals nevertheless attempt to interpret their genetic information. Initiatives such as the Personal Genome Project (PGP) and Illumina's Understand Your Genome are sequencing thousands of adults, collecting phenotypic information and developing computational pipelines to identify the most important variant genotypes harbored by each individual. These pipelines consider database and allele frequency annotations and bioinformatics classifications. We propose that the next step will be to integrate these different sources of information to estimate the probability that a given individual has specific phenotypes of clinical interest. To this end, we have designed a Bayesian probabilistic model to predict the probability of dichotomous phenotypes. When applied to a cohort from PGP, predictions of Gilbert syndrome, Graves' disease, non-Hodgkin lymphoma, and various blood groups were accurate, as individuals manifesting the phenotype in question exhibited the highest, or among the highest, predicted probabilities. Thirty-eight PGP phenotypes (26%) were predicted with area-under-the-ROC curve (AUC)>0.7, and 23 (15.8%) of these were statistically significant, based on permutation tests. Moreover, in a Critical Assessment of Genome Interpretation (CAGI) blinded prediction experiment, the models were used to match 77 PGP genomes to phenotypic profiles, generating the most accurate prediction of 16 submissions, according to an independent assessor. Although the models are currently insufficiently accurate for diagnostic utility, we expect their performance to improve with growth of publicly available genomics data and model refinement by domain experts.
- Published
- 2014
- Full Text
- View/download PDF
42. Elucidating common structural features of human pathogenic variations using large-scale atomic-resolution protein networks.
- Author
-
Das J, Lee HR, Sagar A, Fragoza R, Liang J, Wei X, Wang X, Mort M, Stenson PD, Cooper DN, and Yu H
- Subjects
- Computational Biology, Databases, Protein, Humans, Models, Theoretical, Structure-Activity Relationship, Genetic Diseases, Inborn genetics, Genetic Diseases, Inborn pathology, Protein Interaction Maps genetics, Systems Biology
- Abstract
With the rapid growth of structural genomics, numerous protein crystal structures have become available. However, the parallel increase in knowledge of the functional principles underlying biological processes, and more specifically the underlying molecular mechanisms of disease, has been less dramatic. This notwithstanding, the study of complex cellular networks has made possible the inference of protein functions on a large scale. Here, we combine the scale of network systems biology with the resolution of traditional structural biology to generate a large-scale atomic-resolution interactome-network comprising 3,398 interactions between 2,890 proteins with a well-defined interaction interface and interface residues for each interaction. Within the framework of this atomic-resolution network, we have explored the structural principles underlying variations causing human-inherited disease. We find that in-frame pathogenic variations are enriched at both the interface and in the interacting domain, suggesting that variations not only at interface "hot-spots," but in the entire interacting domain can result in alterations of interactions. Further, the sites of pathogenic variations are closely related to the biophysical strength of the interactions they perturb. Finally, we show that biochemical alterations consequent to these variations are considerably more disruptive than evolutionary changes, with the most significant alterations at the protein interaction interface., (© 2014 WILEY PERIODICALS, INC.)
- Published
- 2014
- Full Text
- View/download PDF
43. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine.
- Author
-
Stenson PD, Mort M, Ball EV, Shaw K, Phillips A, and Cooper DN
- Subjects
- Cell Nucleus genetics, Computational Biology, DNA Copy Number Variations, Genetic Predisposition to Disease, Genetic Testing, Genomics, Humans, Polymorphism, Genetic, Precision Medicine, Databases, Genetic, Genome, Human, Germ-Line Mutation
- Abstract
The Human Gene Mutation Database (HGMD®) is a comprehensive collection of germline mutations in nuclear genes that underlie, or are associated with, human inherited disease. By June 2013, the database contained over 141,000 different lesions detected in over 5,700 different genes, with new mutation entries currently accumulating at a rate exceeding 10,000 per annum. HGMD was originally established in 1996 for the scientific study of mutational mechanisms in human genes. However, it has since acquired a much broader utility as a central unified disease-oriented mutation repository utilized by human molecular geneticists, genome scientists, molecular biologists, clinicians and genetic counsellors as well as by those specializing in biopharmaceuticals, bioinformatics and personalized genomics. The public version of HGMD (http://www.hgmd.org) is freely available to registered users from academic institutions/non-profit organizations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via BIOBASE GmbH.
- Published
- 2014
- Full Text
- View/download PDF
44. Using exome data to identify malignant hyperthermia susceptibility mutations.
- Author
-
Gonsalves SG, Ng D, Johnston JJ, Teer JK, Stenson PD, Cooper DN, Mullikin JC, and Biesecker LG
- Subjects
- Aged, Calcium Channels genetics, Calcium Channels, L-Type, Cohort Studies, Databases, Genetic, Exons genetics, Female, Genetic Variation, Humans, Longitudinal Studies, Male, Middle Aged, Mutation, Missense genetics, Mutation, Missense physiology, Penetrance, Predictive Value of Tests, Ryanodine Receptor Calcium Release Channel genetics, Exome genetics, Genetic Predisposition to Disease genetics, Malignant Hyperthermia genetics, Mutation genetics, Mutation physiology
- Abstract
Background: Malignant hyperthermia susceptibility (MHS) is a life-threatening, inherited disorder of muscle calcium metabolism, triggered by anesthetics and depolarizing muscle relaxants. An unselected cohort was screened for MHS mutations using exome sequencing. The aim of this study was to pilot a strategy for the RYR1 and CACNA1S genes., Methods: Exome sequencing was performed on 870 volunteers not ascertained for MHS. Variants in RYR1 and CACNA1S were annotated using an algorithm that filtered results based on mutation type, frequency, and information in mutation databases. Variants were scored on a six-point pathogenicity scale. Medical histories and pedigrees were reviewed for malignant hyperthermia and related disorders., Results: The authors identified 70 RYR1 and 53 CACNA1S variants among 870 exomes. Sixty-three RYR1 and 41 CACNA1S variants passed the quality and frequency metrics but the authors excluded synonymous variants. In RYR1, the authors identified 65 missense mutations, one nonsense, two that affected splicing, and one non-frameshift indel. In CACNA1S, 48 missense, one frameshift deletion, one splicing, and one non-frameshift indel were identified. RYR1 variants predicted to be pathogenic for MHS were found in three participants without medical or family histories of MHS. Numerous variants, previously described as pathogenic in mutation databases, were reclassified by the authors as being of unknown pathogenicity., Conclusions: Exome sequencing can identify asymptomatic patients at risk for MHS, although the interpretation of exome variants can be challenging. The use of exome sequencing in unselected cohorts is an important tool to understand the prevalence and penetrance of MHS, a critical challenge for the field.
- Published
- 2013
- Full Text
- View/download PDF
45. MuPIT interactive: webserver for mapping variant positions to annotated, interactive 3D structures.
- Author
-
Niknafs N, Kim D, Kim R, Diekhans M, Ryan M, Stenson PD, Cooper DN, and Karchin R
- Subjects
- Chromosome Mapping, Databases, Genetic, Exome, Genomics, Humans, Internet, Molecular Sequence Annotation, Neoplasms genetics, Protein Conformation, Sequence Alignment, Sequence Analysis, DNA, Software, Computational Biology, Genome, Human, Mutation, Polymorphism, Single Nucleotide
- Abstract
Mutation position imaging toolbox (MuPIT) interactive is a browser-based application for single-nucleotide variants (SNVs), which automatically maps the genomic coordinates of SNVs onto the coordinates of available three-dimensional (3D) protein structures. The application is designed for interactive browser-based visualization of the putative functional relevance of SNVs by biologists who are not necessarily experts either in bioinformatics or protein structure. Users may submit batches of several thousand SNVs and review all protein structures that cover the SNVs, including available functional annotations such as binding sites, mutagenesis experiments, and common polymorphisms. Multiple SNVs may be mapped onto each structure, enabling 3D visualization of SNV clusters and their relationship to functionally annotated positions. We illustrate the utility of MuPIT interactive in rationalizing the impact of selected polymorphisms in the PharmGKB database, somatic mutations identified in the Cancer Genome Atlas study of invasive breast carcinomas, and rare variants identified in the exome sequencing project. MuPIT interactive is freely available for non-profit use at http://mupit.icm.jhu.edu .
- Published
- 2013
- Full Text
- View/download PDF
46. Interpreting secondary cardiac disease variants in an exome cohort.
- Author
-
Ng D, Johnston JJ, Teer JK, Singh LN, Peller LC, Wynter JS, Lewis KL, Cooper DN, Stenson PD, Mullikin JC, and Biesecker LG
- Subjects
- Aged, Arrhythmias, Cardiac genetics, Calcium-Binding Proteins genetics, Cardiac Myosins genetics, Cardiomyopathies genetics, Carrier Proteins genetics, Cohort Studies, Databases, Genetic, Death, Sudden, Cardiac, ERG1 Potassium Channel, Ether-A-Go-Go Potassium Channels genetics, Female, Genetic Variation, Genotype, Humans, Male, Middle Aged, Myosin Heavy Chains genetics, Phenotype, Polymorphism, Single Nucleotide, Potassium Channels, Voltage-Gated genetics, Sequence Analysis, DNA, Exome, Heart Diseases genetics
- Abstract
Background: Massively parallel sequencing to identify rare variants is widely practiced in medical research and in the clinic. Genome and exome sequencing can identify the genetic cause of a disease (primary results), but it can also identify pathogenic variants underlying diseases that are not being sought (secondary or incidental results). A major controversy has developed surrounding the return of secondary results to research participants. We have piloted a method to analyze exomes to identify participants at risk for cardiac arrhythmias, cardiomyopathies, or sudden death., Methods and Results: Exome sequencing was performed on 870 participants not selected for arrhythmia, cardiomyopathy, or a family history of sudden death. Exome data from 22 cardiac arrhythmia- and 41 cardiomyopathy-associated genes were analyzed using an algorithm that filtered results on genotype quality, frequency, and database information. We identified 1367 variants in the cardiomyopathy genes and 360 variants in the arrhythmia genes. Six participants had pathogenic variants associated with dilated cardiomyopathy (n=1), hypertrophic cardiomyopathy (n=2), left ventricular noncompaction (n=1), or long-QT syndrome (n=2). Two of these participants had evidence of cardiomyopathy and 1 had left ventricular noncompaction on echocardiogram. Three participants with likely pathogenic variants had prolonged QTc. Family history included unexplained sudden death among relatives., Conclusions: Approximately 0.5% of participants in this study had pathogenic variants in known cardiomyopathy or arrhythmia genes. This high frequency may be due to self-selection, false positives, or underestimation of the prevalence of these conditions. We conclude that clinically important cardiomyopathy and dysrhythmia secondary variants can be identified in unselected exomes.
- Published
- 2013
- Full Text
- View/download PDF
47. CRAVAT: cancer-related analysis of variants toolkit.
- Author
-
Douville C, Carter H, Kim R, Niknafs N, Diekhans M, Stenson PD, Cooper DN, Ryan M, and Karchin R
- Subjects
- Genomics methods, Humans, Internet, Mutation, Neoplasms genetics, Software
- Abstract
Summary: Advances in sequencing technology have greatly reduced the costs incurred in collecting raw sequencing data. Academic laboratories and researchers therefore now have access to very large datasets of genomic alterations but limited time and computational resources to analyse their potential biological importance. Here, we provide a web-based application, Cancer-Related Analysis of Variants Toolkit, designed with an easy-to-use interface to facilitate the high-throughput assessment and prioritization of genes and missense alterations important for cancer tumorigenesis. Cancer-Related Analysis of Variants Toolkit provides predictive scores for germline variants, somatic mutations and relative gene importance, as well as annotations from published literature and databases. Results are emailed to users as MS Excel spreadsheets and/or tab-separated text files., Availability: http://www.cravat.us/
- Published
- 2013
- Full Text
- View/download PDF
48. Identifying Mendelian disease genes with the variant effect scoring tool.
- Author
-
Carter H, Douville C, Stenson PD, Cooper DN, and Karchin R
- Subjects
- Area Under Curve, Artificial Intelligence, Databases, Genetic, Exome genetics, Humans, ROC Curve, Algorithms, Computational Biology methods, Genetic Diseases, Inborn genetics, Mutation, Missense genetics
- Abstract
Background: Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease., Results: We have developed the Variant Effect Scoring Tool (VEST), a supervised machine learning-based classifier, to prioritize rare missense variants with likely involvement in human disease. The VEST classifier training set comprised ~ 45,000 disease mutations from the latest Human Gene Mutation Database release and another ~45,000 high frequency (allele frequency >1%) putatively neutral missense variants from the Exome Sequencing Project. VEST outperforms some of the most popular methods for prioritizing missense variants in carefully designed holdout benchmarking experiments (VEST ROC AUC = 0.91, PolyPhen2 ROC AUC = 0.86, SIFT4.0 ROC AUC = 0.84). VEST estimates variant score p-values against a null distribution of VEST scores for neutral variants not included in the VEST training set. These p-values can be aggregated at the gene level across multiple disease exomes to rank genes for probable disease involvement. We tested the ability of an aggregate VEST gene score to identify candidate Mendelian disease genes, based on whole-exome sequencing of a small number of disease cases. We used whole-exome data for two Mendelian disorders for which the causal gene is known. Considering only genes that contained variants in all cases, the VEST gene score ranked dihydroorotate dehydrogenase (DHODH) number 2 of 2253 genes in four cases of Miller syndrome, and myosin-3 (MYH3) number 2 of 2313 genes in three cases of Freeman Sheldon syndrome., Conclusions: Our results demonstrate the potential power gain of aggregating bioinformatics variant scores into gene-level scores and the general utility of bioinformatics in assisting the search for disease genes in large-scale exome sequencing studies. VEST is available as a stand-alone software package at http://wiki.chasmsoftware.org and is hosted by the CRAVAT web server at http://www.cravat.us.
- Published
- 2013
- Full Text
- View/download PDF
49. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models.
- Author
-
Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GL, Edwards KJ, Day IN, and Gaunt TR
- Subjects
- Genetic Association Studies methods, Genotype, Humans, Internet, Phenotype, Polymorphism, Single Nucleotide, Proteins metabolism, Reproducibility of Results, Software, Triticum genetics, Algorithms, Amino Acid Substitution, Computational Biology methods, Mutation, Proteins genetics
- Abstract
The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole-genome/whole-exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever-increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species-independent method with optional species-specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state-of-the-art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high-throughput/large-scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat (Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web-based implementation of FATHMM, including a high-throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk., (© 2012 Wiley Periodicals, Inc.)
- Published
- 2013
- Full Text
- View/download PDF
50. Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing.
- Author
-
Xue Y, Chen Y, Ayub Q, Huang N, Ball EV, Mort M, Phillips AD, Shaw K, Stenson PD, Cooper DN, and Tyler-Smith C
- Subjects
- Databases, Nucleic Acid, Genome, Human, Genome-Wide Association Study, Humans, Mutation, Missense, Prevalence, Alleles, Mutation Rate
- Abstract
We have assessed the numbers of potentially deleterious variants in the genomes of apparently healthy humans by using (1) low-coverage whole-genome sequence data from 179 individuals in the 1000 Genomes Pilot Project and (2) current predictions and databases of deleterious variants. Each individual carried 281-515 missense substitutions, 40-85 of which were homozygous, predicted to be highly damaging. They also carried 40-110 variants classified by the Human Gene Mutation Database (HGMD) as disease-causing mutations (DMs), 3-24 variants in the homozygous state, and many polymorphisms putatively associated with disease. Whereas many of these DMs are likely to represent disease-allele-annotation errors, between 0 and 8 DMs (0-1 homozygous) per individual are predicted to be highly damaging, and some of them provide information of medical relevance. These analyses emphasize the need for improved annotation of disease alleles both in mutation databases and in the primary literature; some HGMD mutation data have been recategorized on the basis of the present findings, an iterative process that is both necessary and ongoing. Our estimates of deleterious-allele numbers are likely to be subject to both overcounting and undercounting. However, our current best mean estimates of ~400 damaging variants and ~2 bona fide disease mutations per individual are likely to increase rather than decrease as sequencing studies ascertain rare variants more effectively and as additional disease alleles are discovered., (Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.)
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.