48 results on '"Suyash Shringarpure"'
Search Results
2. A Polygenic Score for Type 2 Diabetes Improves Risk Stratification Beyond Current Clinical Screening Factors in an Ancestrally Diverse Sample
- Author
-
James R. Ashenhurst, Olga V. Sazonova, Olivia Svrchek, Stacey Detweiler, Ryosuke Kita, Liz Babalola, Matthew McIntyre, Stella Aslibekyan, Pierre Fontanillas, Suyash Shringarpure, andMe Research Team, Jeffrey D. Pollard, and Bertram L. Koelsch
- Subjects
polygenic score ,type 2 diabees ,consumer genomics ,genetic risk ,diabetes screening ,Genetics ,QH426-470 - Abstract
A substantial proportion of the adult United States population with type 2 diabetes (T2D) are undiagnosed, calling into question the comprehensiveness of current screening practices, which primarily rely on age, family history, and body mass index (BMI). We hypothesized that a polygenic score (PGS) may serve as a complementary tool to identify high-risk individuals. The T2D polygenic score maintained predictive utility after adjusting for family history and combining genetics with family history led to even more improved disease risk prediction. We observed that the PGS was meaningfully related to age of onset with implications for screening practices: there was a linear and statistically significant relationship between the PGS and T2D onset (−1.3 years per standard deviation of the PGS). Evaluation of U.S. Preventive Task Force and a simplified version of American Diabetes Association screening guidelines showed that addition of a screening criterion for those above the 90th percentile of the PGS provided a small increase the sensitivity of the screening algorithm. Among T2D-negative individuals, the T2D PGS was associated with prediabetes, where each standard deviation increase of the PGS was associated with a 23% increase in the odds of prediabetes diagnosis. Additionally, each standard deviation increase in the PGS corresponded to a 43% increase in the odds of incident T2D at one-year follow-up. Using complications and forms of clinical intervention (i.e., lifestyle modification, metformin treatment, or insulin treatment) as proxies for advanced illness we also found statistically significant associations between the T2D PGS and insulin treatment and diabetic neuropathy. Importantly, we were able to replicate many findings in a Hispanic/Latino cohort from our database, highlighting the value of the T2D PGS as a clinical tool for individuals with ancestry other than European. In this group, the T2D PGS provided additional disease risk information beyond that offered by traditional screening methodologies. The T2D PGS also had predictive value for the age of onset and for prediabetes among T2D-negative Hispanic/Latino participants. These findings strengthen the notion that a T2D PGS could play a role in the clinical setting across multiple ancestries, potentially improving T2D screening practices, risk stratification, and disease management.
- Published
- 2022
- Full Text
- View/download PDF
3. Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies
- Author
-
Genevieve L. Wojcik, Christian Fuchsberger, Daniel Taliun, Ryan Welch, Alicia R Martin, Suyash Shringarpure, Christopher S. Carlson, Goncalo Abecasis, Hyun Min Kang, Michael Boehnke, Carlos D. Bustamante, Christopher R. Gignoux, and Eimear E. Kenny
- Subjects
Genomics ,Statistical Genetics ,Imputation ,tag SNPs ,array design ,Genetics ,QH426-470 - Abstract
The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r2 at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5–3.1% for an array of one million sites and 0.7–7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
- Published
- 2018
- Full Text
- View/download PDF
4. The inference of sex-biased human demography from whole-genome data.
- Author
-
Shaila Musharoff, Suyash Shringarpure, Carlos D Bustamante, and Sohini Ramachandran
- Subjects
Genetics ,QH426-470 - Abstract
Sex-biased demographic events ("sex-bias") involve unequal numbers of females and males. These events are typically inferred from the relative amount of X-chromosomal to autosomal genetic variation and have led to conflicting conclusions about human demographic history. Though population size changes alter the relative amount of X-chromosomal to autosomal genetic diversity even in the absence of sex-bias, this has generally not been accounted for in sex-bias estimators to date. Here, we present a novel method to identify sex-bias from genetic sequence data that models population size changes and estimates the female fraction of the effective population size during each time epoch. Compared to recent sex-bias inference methods, our approach can detect sex-bias that changes on a single population branch without requiring data from an outgroup or knowledge of divergence events. When applied to simulated data, conventional sex-bias estimators are biased by population size changes, especially recent growth or bottlenecks, while our estimator is unbiased. We next apply our method to high-coverage exome data from the 1000 Genomes Project and estimate a male bias in Yorubans (47% female) and Europeans (44%), possibly due to stronger background selection on the X chromosome than on the autosomes. Finally, we apply our method to the 1000 Genomes Project Phase 3 high-coverage Complete Genomics whole-genome data and estimate a female bias in Yorubans (63% female), Europeans (84%), Punjabis (82%), as well as Peruvians (56%), and a male bias in the Southern Han Chinese (45%). Our method additionally identifies a male-biased migration out of Africa based on data from Europeans (20% female). Our results demonstrate that modeling population size change is necessary to estimate sex-bias parameters accurately. Our approach gives insight into signatures of sex-bias in sexual species, and the demographic models it produces can serve as more accurate null models for tests of selection.
- Published
- 2019
- Full Text
- View/download PDF
5. Replication and characterization of CADM2 and MSRA genes on human behavior
- Author
-
Brian Boutwell, David Hinds, Michelle Agee, Babak Alipanahi, Adam Auton, Robert K. Bell, Katarzyna Bryc, Sarah L. Elson, Pierre Fontanillas, Nicholas A. Furlotte, David A. Hinds, Bethann S. Hromatka, Karen E. Huber, Aaron Kleinman, Nadia K. Litterman, Matthew H. McIntyre, Joanna L. Mountain, Carrie A.M. Northover, J.Fah Sathirapongsasuti, Olga V. Sazonova, Janie F. Shelton, Suyash Shringarpure, Chao Tian, Joyce Y. Tung, Vladimir Vacic, and Catherine H. Wilson
- Subjects
Genetics ,Clinical psychology ,Psychiatry ,Neuroscience ,Psychology ,Science (General) ,Q1-390 ,Social sciences (General) ,H1-99 - Abstract
Progress identifying the genetic determinants of personality has historically been slow, with candidate gene studies and small-scale genome-wide association studies yielding few reproducible results. In the UK Biobank study, genetic variants in CADM2 and MSRA were recently shown to influence risk taking behavior and irritability respectively, representing some of the first genomic loci to be associated with aspects of personality. We extend this observation by performing a personality “phenome-scan” across 16 traits in up to 140,487 participants from 23andMe for these two genes. Genome-wide heritability estimates for these traits ranged from 5–19%, with both CADM2 and MSRA demonstrating significant effects on multiple personality types. These associations covered all aspects of the big five personality domains, including specific facet traits such as compliance, altruism, anxiety and activity/energy. This study both confirms and extends the original observations, highlighting the role of genetics in aspects of mental health and behavior.
- Published
- 2017
- Full Text
- View/download PDF
6. The Great Migration and African-American Genomic Diversity.
- Author
-
Soheil Baharian, Maxime Barakatt, Christopher R Gignoux, Suyash Shringarpure, Jacob Errington, William J Blot, Carlos D Bustamante, Eimear E Kenny, Scott M Williams, Melinda C Aldrich, and Simon Gravel
- Subjects
Genetics ,QH426-470 - Abstract
We present a comprehensive assessment of genomic diversity in the African-American population by studying three genotyped cohorts comprising 3,726 African-Americans from across the United States that provide a representative description of the population across all US states and socioeconomic status. An estimated 82.1% of ancestors to African-Americans lived in Africa prior to the advent of transatlantic travel, 16.7% in Europe, and 1.2% in the Americas, with increased African ancestry in the southern United States compared to the North and West. Combining demographic models of ancestry and those of relatedness suggests that admixture occurred predominantly in the South prior to the Civil War and that ancestry-biased migration is responsible for regional differences in ancestry. We find that recent migrations also caused a strong increase in genetic relatedness among geographically distant African-Americans. Long-range relatedness among African-Americans and between African-Americans and European-Americans thus track north- and west-bound migration routes followed during the Great Migration of the twentieth century. By contrast, short-range relatedness patterns suggest comparable mobility of ∼15-16km per generation for African-Americans and European-Americans, as estimated using a novel analytical model of isolation-by-distance.
- Published
- 2016
- Full Text
- View/download PDF
7. CSMET: comparative genomic motif detection via multi-resolution phylogenetic shadowing.
- Author
-
Pradipta Ray, Suyash Shringarpure, Mladen Kolar, and Eric P Xing
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution. Conventional probabilistic phylogenetic shadowing methods model the evolution of genomes only at nucleotide level, and lack the ability to capture the evolutionary dynamics of functional turnover of aligned sequence entities. As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles. We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a context-dependent probabilistic graphical model that allows aligned sites from different taxa in a multiple alignment to be modeled by either a background or an appropriate motif phylogeny conditioning on the functional specifications of each taxon. The functional specifications themselves are the output of a phylogeny which models the evolution not of individual nucleotides, but of the overall functionality (e.g., functional retention or loss) of the aligned sequence segments over lineages. Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders.
- Published
- 2008
- Full Text
- View/download PDF
8. Prevalence of Alpha-1 Antitrypsin Deficiency, Self-Reported Behavior Change, and Health Care Engagement Among Direct-to-Consumer Recipients of a Personalized Genetic Risk Report
- Author
-
James R. Ashenhurst, Hoang Nhan, Janie F. Shelton, Shirley Wu, Joyce Y. Tung, Sarah L. Elson, James K. Stoller, Michelle Agee, Stella Aslibekyan, Adam Auton, Elizabeth Babalola, Robert K. Bell, Jessica Bielenberg, Katarzyna Bryc, Emily Bullis, Briana Cameron, Daniella Coker, Gabriel Cuellar Partida, Devika Dhamija, Sayantan Das, Teresa Filshtein, Kipper Fletez-Brant, Pierre Fontanillas, Will Freyman, Pooja M. Gandhi, Karl Heilbron, Barry Hicks, David A. Hinds, Karen E. Huber, Ethan M. Jewett, Yunxuan Jiang, Aaron Kleinman, Katelyn Kukar, Vanessa A. Lane, Keng-Han Lin, Maya Lowe, Marie K. Luff, Jennifer C. McCreight, Matthew H. McIntyre, Kimberly F. McManus, Steven J. Micheletti, Meghan E. Moreno, Joanna L. Mountain, Sahar V. Mozaffari, Priyanka Nandakumar, Elizabeth S. Noblin, Jared O’Connell, Aaron A. Petrakovitz, G. David Poznik, Morgan Schumacher, Anjali J. Shastri, Jingchunzi Shi, Suyash Shringarpure, Chao Tian, Vinh Tran, Xin Wang, Wei Wang, Catherine H. Weldon, and Peter Wilton
- Subjects
Male ,Pulmonary and Respiratory Medicine ,Pediatrics ,medicine.medical_specialty ,Genotype ,Critical Care and Intensive Care Medicine ,Direct-To-Consumer Screening and Testing ,alpha 1-Antitrypsin Deficiency ,Health care ,Prevalence ,medicine ,Humans ,Genetic Testing ,Allele frequency ,Genetic testing ,COPD ,Alpha 1-antitrypsin deficiency ,medicine.diagnostic_test ,business.industry ,Behavior change ,Primary care physician ,Odds ratio ,Middle Aged ,medicine.disease ,Female ,Self Report ,Cardiology and Cardiovascular Medicine ,business - Abstract
Background Alpha-1 antitrypsin deficiency (AATD) is an autosomal co-dominant condition that predisposes to emphysema, cirrhosis, panniculitis, and vasculitis. Under-recognition has prompted efforts to enhance early detection and testing of at-risk individuals. Direct-to-consumer (DTC) genetic testing represents an additional method of detection. Research Question The study addressed three questions: 1) Does a DTC testing service identify previously undetected individuals with AATD? 2) What was the time interval between initial AATD-related symptoms and initial diagnosis of AATD in such individuals? and 3) What was the behavioral impact of learning about a new diagnosis of AATD through a DTC test? Study Design and Methods In this cross-sectional study, 195,014 individuals responded to a survey within the 23andMe, Inc. research platform. Results Among 195,014 study participants, the allele frequency for either the PI*S and PI*Z AATD variants was 21.6% (6.5% for PI*Z and 15.1% for PI*S); 0.63% were PI*ZZ, half of whom reported having a physician confirm the diagnosis. Approximately 27% of those with physician-diagnosed AATD reported first becoming aware of AATD through the DTC test. Among those newly-aware participants, the diagnostic delay interval was 22.3 years. Participants frequently shared their DTC test results with healthcare providers (HCPs) and the reported impact of learning a diagnosis of AATD was high. For example, 51.1% of PI*ZZ individuals shared their DTC result with an HCP. The odds ratio for PI*ZZ smokers to report smoking reduction as a result of receiving the DTC result was 1.7 [CI 1.4, 2.2] compared to those without a Z allele and for reduced alcohol consumption was 4.0 [CI 2.6, 5.9]. Interpretation In this largest available report on DTC testing for AATD, this test, in combination with clinical follow-up, can help to identify previously undiagnosed AATD patients. Moreover, receipt of the DTC AATD report was associated with positive behavior change, especially among those with risk variants.
- Published
- 2022
9. mStruct: a new admixture model for inference of population structure in light of both genetic admixing and allele mutations.
- Author
-
Suyash Shringarpure and Eric P. Xing
- Published
- 2008
- Full Text
- View/download PDF
10. Genome-wide Study Identifies Association between HLA-B∗55:01 and Self-Reported Penicillin Allergy
- Author
-
Kristi Krebs, Jonas Bovijn, Neil Zheng, Maarja Lepamets, Jenny C. Censin, Tuuli Jürgenson, Dage Särg, Erik Abner, Triin Laisk, Yang Luo, Line Skotte, Frank Geller, Bjarke Feenstra, Wei Wang, Adam Auton, Soumya Raychaudhuri, Tõnu Esko, Andres Metspalu, Sven Laur, Dan M. Roden, Wei-Qi Wei, Michael V. Holmes, Cecilia M. Lindgren, Elizabeth J. Phillips, Reedik Mägi, Lili Milani, João Fadista, Michelle Agee, Stella Aslibekyan, Robert K. Bell, Katarzyna Bryc, Sarah K. Clark, Sarah L. Elson, Kipper Fletez-Brant, Pierre Fontanillas, Nicholas A. Furlotte, Pooja M. Gandhi, Karl Heilbron, Barry Hicks, David A. Hinds, Karen E. Huber, Ethan M. Jewett, Yunxuan Jiang, Aaron Kleinman, Keng-Han Lin, Nadia K. Litterman, Marie K. Luff, Jennifer C. McCreight, Matthew H. McIntyre, Kimberly F. McManus, Joanna L. Mountain, Sahar V. Mozaffari, Priyanka Nandakumar, Elizabeth S. Noblin, Carrie A.M. Northover, Jared O’Connell, Aaron A. Petrakovitz, Steven J. Pitts, G. David Poznik, J. Fah Sathirapongsasuti, Anjali J. Shastri, Janie F. Shelton, Suyash Shringarpure, Chao Tian, Joyce Y. Tung, Robert J. Tunney, Vladimir Vacic, Xin Wang, Amir S. Zare, Institute for Molecular Medicine Finland, and University of Helsinki
- Subjects
0301 basic medicine ,Genome-wide association study ,Human leukocyte antigen ,HYPERSENSITIVITY REACTIONS ,FREQUENCY ,BIOBANK ,MECHANISMS ,PTPN22 ,03 medical and health sciences ,0302 clinical medicine ,MANAGEMENT ,Genetics ,medicine ,SNP ,Allele ,METAANALYSIS ,Genetics (clinical) ,business.industry ,1184 Genetics, developmental biology, physiology ,ADVERSE DRUG-REACTIONS ,POLYMORPHISM ,HLA-B ,3. Good health ,HLA ,Penicillin ,030104 developmental biology ,030220 oncology & carcinogenesis ,Pharmacogenomics ,Immunology ,T-CELLS ,business ,medicine.drug - Abstract
Hypersensitivity reactions to drugs are often unpredictable and can be life threatening, underscoring a need for understanding their underlying mechanisms and risk factors. The extent to which germline genetic variation influences the risk of commonly reported drug allergies such as penicillin allergy remains largely unknown. We extracted data from the electronic health records of more than 600,000 participants from the UK, Estonian, and Vanderbilt University Medical Center's BioVU biobanks to study the role of genetic variation in the occurrence of self-reported penicillin hypersensitivity reactions. We used imputed SNP to HLA typing data from these cohorts to further fine map the human leukocyte antigen (HLA) association and replicated our results in 23andMe's research cohort involving a total of 1.12 million individuals. Genome-wide meta-analysis of penicillin allergy revealed two loci, including one located in the HLA region on chromosome 6. This signal was further fine-mapped to the HLA-B∗55:01 allele (OR 1.41 95% CI 1.33-1.49, p value 2.04 × 10-31) and confirmed by independent replication in 23andMe's research cohort (OR 1.30 95% CI 1.25-1.34, p value 1.00 × 10-47). The lead SNP was also associated with lower lymphocyte counts and in silico follow-up suggests a potential effect on T-lymphocytes at HLA-B∗55:01. We also observed a significant hit in PTPN22 and the GWAS results correlated with the genetics of rheumatoid arthritis and psoriasis. We present robust evidence for the role of an allele of the major histocompatibility complex (MHC) I gene HLA-B in the occurrence of penicillin allergy.
- Published
- 2020
11. Genetic Consequences of the Transatlantic Slave Trade in the Americas
- Author
-
Nadia Litterman, Steven J. Micheletti, Janie F. Shelton, Joanna L. Mountain, Michelle Agee, Samantha G. Ancona Esselmann, Sayantan Das, Ethan M. Jewett, S. Clark, A. Petrakovitz, Karl Heilbron, Suyash Shringarpure, Jeffery R. O'Connell, G. David Poznik, Pierre Fontanillas, Kipper Fletez-Brant, Keng-Han Lin, Sahar V. Mozaffari, William A. Freyman, Joyce Y. Tung, Carrie Northover, Anjali J. Shastri, Kimberly F. McManus, Adam Auton, Aaron Kleinman, L. Noblin, P. Gandhi, Xin Wang, Vladimir Vacic, Chao Tian, Karen E. Huber, Jennifer C. McCreight, Yunxuan Jiang, R. Tunney, Robert K. Bell, Sarah L. Elson, Barry W. Hicks, A. Zare, Sandra Beleza, Stella Aslibekyan, David A. Hinds, Meghan E. Moreno, Steven J. Pitts, Kasia Bryc, Matthew H. McIntyre, and P. Nandakumar
- Subjects
Disembarkation ,Variable survival ,Present day ,migration ,identity by descent ,Article ,03 medical and health sciences ,0302 clinical medicine ,genetics ,slave trade ,Genetics (clinical) ,Historical record ,030304 developmental biology ,0303 health sciences ,ancestry ,Genetic data ,population genetics ,Forced migration ,Geography ,Africa ,Ethnology ,admixture ,history ,Americas ,030217 neurology & neurosurgery ,Regional differences - Abstract
According to historical records of transatlantic slavery, traders forcibly deported an estimated 12.5 million people from ports along the Atlantic coastline of Africa between the 16th and 19th centuries, with global impacts reaching to the present day, more than a century and a half after slavery's abolition. Such records have fueled a broad understanding of the forced migration from Africa to the Americas yet remain underexplored in concert with genetic data. Here, we analyzed genotype array data from 50,281 research participants, which-combined with historical shipping documents-illustrate that the current genetic landscape of the Americas is largely concordant with expectations derived from documentation of slave voyages. For instance, genetic connections between people in slave trading regions of Africa and disembarkation regions of the Americas generally mirror the proportion of individuals forcibly moved between those regions. While some discordances can be explained by additional records of deportations within the Americas, other discordances yield insights into variable survival rates and timing of arrival of enslaved people from specific regions of Africa. Furthermore, the greater contribution of African women to the gene pool compared to African men varies across the Americas, consistent with literature documenting regional differences in slavery practices. This investigation of the transatlantic slave trade, which is broad in scope in terms of both datasets and analyses, establishes genetic links between individuals in the Americas and populations across Atlantic Africa, yielding a more comprehensive understanding of the African roots of peoples of the Americas.
- Published
- 2020
12. Symptom-level modelling unravels the shared genetic architecture of anxiety and depression
- Author
-
Stuart MacGregor, Andrew D. Grotzinger, Zachary Gerring, Suyash Shringarpure, Nicholas G. Martin, Enda M. Byrne, Jue-Sheng Ong, Eske M. Derks, Sarah E. Medland, Christel M. Middeldorp, Adrian I. Campos, Wei Wang, Jackson G. Thorp, Jiyuan An, Amsterdam Neuroscience - Mood, Anxiety, Psychosis, Stress & Sleep, and Biological Psychology
- Subjects
Social Psychology ,media_common.quotation_subject ,Experimental and Cognitive Psychology ,Behavioral Symptoms ,Comorbidity ,Anxiety ,Structural equation modeling ,03 medical and health sciences ,Behavioral Neuroscience ,0302 clinical medicine ,SDG 3 - Good Health and Well-being ,medicine ,Personality ,Humans ,Genetic Predisposition to Disease ,Depression (differential diagnoses) ,030304 developmental biology ,media_common ,Neuroticism ,0303 health sciences ,Depression ,medicine.disease ,Genetic architecture ,Latent Class Analysis ,Cohort ,medicine.symptom ,Symptom Assessment ,Psychology ,Factor Analysis, Statistical ,030217 neurology & neurosurgery ,Clinical psychology ,Genome-Wide Association Study - Abstract
Depression and anxiety are highly prevalent and comorbid psychiatric traits that cause considerable burden worldwide. Here we use factor analysis and genomic structural equation modelling to investigate the genetic factor structure underlying 28 items assessing depression, anxiety and neuroticism, a closely related personality trait. Symptoms of depression and anxiety loaded on two distinct, although highly genetically correlated factors, and neuroticism items were partitioned between them. We used this factor structure to conduct genome-wide association analyses on latent factors of depressive symptoms (89 independent variants, 61 genomic loci) and anxiety symptoms (102 variants, 73 loci) in the UK Biobank. Of these associated variants, 72% and 78%, respectively, replicated in an independent cohort of approximately 1.9 million individuals with self-reported diagnosis of depression and anxiety. We use these results to characterize shared and trait-specific genetic associations. Our findings provide insight into the genetic architecture of depression and anxiety and comorbidity between them.
- Published
- 2021
13. Large-scale trans-ethnic replication and discovery of genetic associations for rare diseases with self-reported medical data
- Author
-
Acevedo A, Jubb A, Sarov-Blat L, Yue P, Dhamija D, Briana Cameron, Adam Auton, Jiang Y, Wang W, Robert Gentleman, and Suyash Shringarpure
- Subjects
Cohort ,Ethnic group ,Genome-wide association study ,Computational biology ,Biology ,Biobank ,Genetic architecture ,Genetic association ,Rare disease - Abstract
A key challenge in the study of rare disease genetics is assembling large case cohorts for well-powered studies. We demonstrate the use of self-reported diagnosis data to study rare diseases at scale. We performed genome-wide association studies (GWAS) for 33 rare diseases using self-reported diagnosis phenotypes and re-discovered 29 known associations to validate our approach. In addition, we performed the first GWAS for Duane retraction syndrome, vestibular schwannoma and spontaneous pneumothorax, and report novel genome-wide significant associations for these diseases. We replicated these novel associations in non-European populations within the 23andMe, Inc. cohort as well as in the UK Biobank cohort. We also show that mixed model analyses including all ethnicities and related samples increase the power for finding associations in rare diseases. Our results, based on analysis of 19,084 rare disease cases for 33 diseases from 7 populations, show that large-scale online collection of self-reported data is a viable method for discovery and replication of genetic associations for rare diseases. This approach, which is complementary to sequencing-based approaches, will enable the discovery of more novel genetic associations for increasingly rare diseases across multiple ancestries and shed more light on the genetic architecture of rare diseases.
- Published
- 2021
14. Multi-Trait Genetic Analysis Identifies Autoimmune Loci Associated with Cutaneous Melanoma
- Author
-
Upekha E. Liyanage, Stuart MacGregor, D. Timothy Bishop, Jianxin Shi, Jiyuan An, Jue Sheng Ong, Xikun Han, Richard A. Scolyer, Nicholas G. Martin, Sarah E. Medland, Enda M. Byrne, Adèle C. Green, Robyn P.M. Saw, John F. Thompson, Jonathan Stretch, Andrew Spillane, Yunxuan Jiang, Chao Tian, Scott G. Gordon, David L. Duffy, Catherine M. Olsen, David C. Whiteman, Georgina V. Long, Mark M. Iles, Maria Teresa Landi, Matthew H. Law, Michelle Agee, Stella Aslibekyan, Adam Auton, Elizabeth Babalola, Robert K. Bell, Jessica Bielenberg, Katarzyna Bryc, Emily Bullis, Briana Cameron, Daniella Coker, Gabriel Cuellar Partida, Devika Dhamija, Sayantan Das, Sarah L. Elson, Teresa Filshtein, Kipper Fletez-Brant, Pierre Fontanillas, Will Freyman, Pooja M. Gandhi, Karl Heilbron, Barry Hicks, David A. Hinds, Karen E. Huber, Ethan M. Jewett, Aaron Kleinman, Katelyn Kukar, Keng-Han Lin, Maya Lowe, Marie K. Luff, Jennifer C. McCreight, Matthew H. McIntyre, Kimberly F. McManus, Steven J. Micheletti, Meghan E. Moreno, Joanna L. Mountain, Sahar V. Mozaffari, Priyanka Nandakumar, Elizabeth S. Noblin, Jared O'Connell, Aaron A. Petrakovitz, G. David Poznik, Anjali J. Shastri, Janie F. Shelton, Jingchunzi Shi, Suyash Shringarpure, Vinh Tran, Joyce Y. Tung, Xin Wang, Wei Wang, Catherine H. Weldon, and Peter Wilton
- Subjects
Genetics ,Linkage disequilibrium ,Skin Neoplasms ,Genome-wide association study ,Cell Biology ,Dermatology ,Biology ,Biochemistry ,Identity by descent ,Genetic analysis ,Genetic correlation ,Polymorphism, Single Nucleotide ,Minor allele frequency ,Phenotype ,Genetic Loci ,Cutaneous melanoma ,Humans ,Genetic Predisposition to Disease ,Molecular Biology ,Melanoma ,Genetic association ,Genome-Wide Association Study - Abstract
Genome-wide association studies (GWAS) have identified a number of risk loci for cutaneous melanoma. Cutaneous melanoma shares overlapping genetic risk (genetic correlation) with a number of other traits, including its risk factors such as sunburn propensity. This genetic correlation can be exploited to identify additional cutaneous melanoma risk loci by multitrait analysis of GWAS (MTAG). We used bivariate linkage disequilibrium-score regression score regression to identify traits that are genetically correlated with clinically confirmed cutaneous melanoma and then used publicly available GWAS for these traits in a multitrait analysis of GWAS. Multitrait analysis of GWAS allows GWAS to be combined while accounting for sample overlap and incomplete genetic correlation. We identified a total of 74 genome-wide independent loci, 19 of them were not previously reported in the input cutaneous melanoma GWAS meta-analysis. Of these loci, 55 were replicated (P0.05/74, Bonferroni-corrected P-value in two independent cutaneous melanoma replication cohorts from Melanoma Institute Australia and 23andMe, Inc. Among the, to our knowledge, previously unreported cutaneous melanoma loci are ones that have also been associated with autoimmune traits including rs715199 near LPP and rs10858023 near AP4B1. Our analysis indicates genetic correlation between traits can be leveraged to identify new risk genes for cutaneous melanoma.
- Published
- 2021
15. Fast and robust identity-by-descent inference with the templated positional Burrows-Wheeler transform
- Author
-
Katarzyna Bryc, Adam Auton, William A. Freyman, Kimberly F. McManus, Ethan M. Jewett, and Suyash Shringarpure
- Subjects
Burrows–Wheeler transform ,Computer science ,Inference ,Biology ,AcademicSubjects/SCI01180 ,computer.software_genre ,Identity by descent ,03 medical and health sciences ,0302 clinical medicine ,templated positional Burrows–Wheeler transform ,Methods ,Genetics ,Humans ,False Positive Reactions ,False Negative Reactions ,Mexico ,Molecular Biology ,Genotyping ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Codebase ,Isolation by distance ,0303 health sciences ,Genome, Human ,Haplotype ,AcademicSubjects/SCI01130 ,population genetics ,Phaser ,digestive system diseases ,Hierarchical clustering ,Phylogeography ,Haplotypes ,identity-by-descent ,Data mining ,computer ,Algorithms ,Software ,030217 neurology & neurosurgery ,Type I and type II errors - Abstract
Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer (DTC) genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale datasets with millions of samples. Furthermore we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for non-commercial use in the code repositoryhttps://github.com/23andMe/phasedibd.
- Published
- 2020
16. A comprehensive re-assessment of the association between vitamin D and cancer susceptibility using Mendelian randomization
- Author
-
Upekha E Liyanage, Amanda B. Spurdle, K. E. Huber, Anna H. Wu, J. Fah Sathirapongsasuti, Douglas A. Corley, C. Tian, Anne Böhmer, David A. Hinds, A. Auton, Xikun Han, Matt Buas, M. Agee, Rebecca C. Fitzgerald, Puya Gharahkhani, Yvonne Romero, S. L. Elson, Ines Gockel, Johannes Schumacher, Leslie Bernstein, Nigel C. Bird, Thomas L. Vaughan, E. S. Noblin, P. Fontanillas, Laura J. Hardie, Brian J. Reid, V. Vacic, M. H. McIntyre, Jiyuan An, Andrew Berchuck, Claire Palles, Weimin Ye, K. Bryc, S. J. Pitts, Jue-Sheng Ong, Geoffrey Liu, R. K. Bell, Rachel E. Neale, Marilie D. Gammon, J. L. Mountain, C. A. M. Northover, Catherine M. Olsen, C. H. Wilson, Janusz Jankowski, Matthew Law, A. Kleinman, Suzanne C. Dixon-Suen, J. Y. Tung, Aaron P. Thrift, Wong-Ho Chow, Paul Pharoah, Jean-Cluade Dusingize, Suyash Shringarpure, Mark M. Iles, Wei Zheng, N. A. Furlotte, Penelope M. Webb, B. Alipanahi, O. V. Sazonova, Stuart MacGregor, David Whiteman, J. F. Shelton, Harvey A. Risch, N. K. Litterman, Tracy A. O'Mara, Nicholas J. Shaheen, Ong, Jue-Sheng [0000-0002-6062-710X], Dixon-Suen, Suzanne C [0000-0003-3714-8386], Han, Xikun [0000-0002-3823-7308], Gockel, Ines [0000-0001-7423-713X], Böhmer, Anne [0000-0002-5716-786X], O'Mara, Tracy [0000-0002-5436-3232], Spurdle, Amanda [0000-0003-1337-7897], Law, Matthew H [0000-0002-4303-8821], Iles, Mark M [0000-0002-2603-6509], Pharoah, Paul [0000-0001-8494-732X], Zheng, Wei [0000-0003-1226-070X], Thrift, Aaron P [0000-0002-0084-5308], Olsen, Catherine [0000-0003-4483-1888], Gharahkhani, Puya [0000-0002-4203-5952], Webb, Penelope M [0000-0003-0733-5930], MacGregor, Stuart [0000-0001-6731-8142], and Apollo - University of Cambridge Repository
- Subjects
0301 basic medicine ,Oncology ,medicine.medical_specialty ,Science ,General Physics and Astronomy ,Sunburn ,Single-nucleotide polymorphism ,General Biochemistry, Genetics and Molecular Biology ,Article ,Cancer prevention ,03 medical and health sciences ,0302 clinical medicine ,Cancer epidemiology ,Risk Factors ,Internal medicine ,Neoplasms ,Mendelian randomization ,Vitamin D and neurology ,medicine ,Humans ,Genetic Predisposition to Disease ,030212 general & internal medicine ,Risk factor ,Vitamin D ,Child ,Cancer genetics ,Multidisciplinary ,business.industry ,Pigmentation ,Case-control study ,Cancer ,Mendelian Randomization Analysis ,General Chemistry ,medicine.disease ,Confidence interval ,030104 developmental biology ,Case-Control Studies ,Multivariate Analysis ,business - Abstract
Previous Mendelian randomization (MR) studies on 25-hydroxyvitamin D (25(OH)D) and cancer have typically adopted a handful of variants and found no relationship between 25(OH)D and cancer; however, issues of horizontal pleiotropy cannot be reliably addressed. Using a larger set of variants associated with 25(OH)D (74 SNPs, up from 6 previously), we perform a unified MR analysis to re-evaluate the relationship between 25(OH)D and ten cancers. Our findings are broadly consistent with previous MR studies indicating no relationship, apart from ovarian cancers (OR 0.89; 95% C.I: 0.82 to 0.96 per 1 SD change in 25(OH)D concentration) and basal cell carcinoma (OR 1.16; 95% C.I.: 1.04 to 1.28). However, after adjustment for pigmentation related variables in a multivariable MR framework, the BCC findings were attenuated. Here we report that lower 25(OH)D is unlikely to be a causal risk factor for most cancers, with our study providing more precise confidence intervals than previously possible., Studies of the genetic association between vitamin D and cancer risk have typically been underpowered. Here the authors analyse this using Mendelian Randomisation with more than 70 vitamin D variants obtained from the UK Biobank and large-scale data from various consortia, confirming null associations between vitamin D and most cancers.
- Published
- 2020
17. Bi-ancestral depression GWAS in the Million Veteran Program and meta-analysis in1.2 million individuals highlight new therapeutic directions
- Author
-
Krishnan Radhakrishnan, Gerard Sanacora, Frank R. Wendt, Daniel F. Levey, Murray B. Stein, Million Veteran Program, John Concato, Joel Gelernter, Gita A. Pathak, Rachel Quaden, Andrew M. McIntosh, Yaira Z. Nunez, Mihaela Aslan, Suyash Shringarpure, Hang Zhou, Renato Polimanti, Kelly M. Harrington, Cassie Overstreet, and Jingchunzi Shi
- Subjects
0301 basic medicine ,Male ,medicine.medical_specialty ,Genome-wide association study ,Article ,03 medical and health sciences ,0302 clinical medicine ,Medicine ,Humans ,Genetic Predisposition to Disease ,Psychiatry ,Behavioural genetics ,Depression (differential diagnoses) ,Veterans ,Depressive Disorder, Major ,business.industry ,General Neuroscience ,medicine.disease ,Biobank ,Genetic architecture ,030104 developmental biology ,Meta-analysis ,Cohort ,Major depressive disorder ,Female ,business ,Neuroscience ,030217 neurology & neurosurgery ,Genome-Wide Association Study - Abstract
Major depressive disorder is the most common neuropsychiatric disorder, affecting 11% of veterans. Here we report results of a large meta-analysis of depression using data from the Million Veteran Program, 23andMe, UK Biobank and FinnGen, including individuals of European ancestry (n = 1,154,267; 340,591 cases) and African ancestry (n = 59,600; 25,843 cases). Transcriptome-wide association study analyses revealed significant associations with expression of NEGR1 in the hypothalamus and DRD2 in the nucleus accumbens, among others. We fine-mapped 178 genomic risk loci, and we identified likely pathogenicity in these variants and overlapping gene expression for 17 genes from our transcriptome-wide association study, including TRAF3. Finally, we were able to show substantial replications of our findings in a large independent cohort (n = 1,342,778) provided by 23andMe. This study sheds light on the genetic architecture of depression and provides new insight into the interrelatedness of complex psychiatric traits. This bi-ancestral genome-wide association study of major depressive disorder (MDD) identified 178 risk variants. The results advance understanding of the biology of MDD and hint at new treatment possibilities.
- Published
- 2020
18. Symptom-level genetic modelling identifies novel risk loci and unravels the shared genetic architecture of anxiety and depression
- Author
-
Nicholas G. Martin, Eske M. Derks, Jackson G. Thorp, Jiyuan An, Stuart MacGregor, Jue-Sheng Ong, Sarah E. Medland, Wei Wang, Suyash Shringarpure, Andrew D. Grotzinger, Adrian I. Campos, Christel M. Middeldorp, Zachary Gerring, and Enda M. Byrne
- Subjects
business.industry ,media_common.quotation_subject ,Genome-wide association study ,medicine.disease ,Comorbidity ,Neuroticism ,Genetic architecture ,Trait ,Medicine ,Personality ,Anxiety ,medicine.symptom ,business ,Depression (differential diagnoses) ,Clinical psychology ,media_common - Abstract
Depression and anxiety are highly prevalent and comorbid psychiatric traits that cause considerable burden worldwide. Previous studies have revealed substantial genetic overlap between depression, anxiety, and a closely related personality trait – neuroticism. Here, we use factor analysis and genomic structural equation modelling (Genomic SEM) to investigate the genetic factor structure underlying 28 items assessing depression, anxiety and neuroticism. Symptoms of depression and anxiety loaded on two distinct, although genetically correlated factors, while neuroticism items were partitioned between them. We leveraged this factor structure to conduct multivariate genome-wide association analyses on latent factors of anxiety symptoms and depressive symptoms, using data from over 400,000 individuals in the UK Biobank. We identified 89 independent variants for the depressive factor (61 genomic loci, 29 novel) and 102 independent variants for the anxiety factor (73 loci, 71 novels). Of these variants, 72% and 78%, respectively, replicated in an independent 23andMe cohort of ∼1.9 million individuals with self-reported diagnosis of depression (634,037 cases) and anxiety (624,615 cases). A pairwise GWAS analysis revealed substantial genetic overlap between anxiety and depression but also showed trait-specific genetic influences; e.g. genomic regions specific to depressive symptoms were associated with hypertriglyceridemia, while regions specific to anxiety symptoms were linked to blood pressure phenotypes. The substantial genetic overlap between the two traits was further evidenced by a lack of trait-specificity in polygenic prediction of depressive and anxiety symptoms. Our results provide novel insight into the genetic architecture of depression and anxiety and comorbidity between them.
- Published
- 2020
19. Exploring Population Structure with Admixture Models and Principal Component Analysis
- Author
-
John Novembre, Chi-Chun Liu, Suyash Shringarpure, and Kenneth Lange
- Subjects
Conservation genetics ,Structure (mathematical logic) ,Principal Component Analysis ,Models, Genetic ,Genome, Human ,Computer science ,Human evolutionary genetics ,Computational Biology ,Sample (statistics) ,Population stratification ,Polymorphism, Single Nucleotide ,Data science ,Article ,Human genetics ,Genetics, Population ,Principal component analysis ,Feature (machine learning) ,Humans - Abstract
Population structure is a commonplace feature of genetic variation data, and it has importance in numerous application areas, including evolutionary genetics, conservation genetics, and human genetics. Understanding the structure in a sample is necessary before more sophisticated analyses are undertaken. Here we provide a protocol for running principal component analysis (PCA) and admixture proportion inference—two of the most commonly used approaches in describing population structure. Along with hands-on examples with CEPH-Human Genome Diversity Panel and pragmatic caveats, readers will learn to analyze and visualize population structure on their own data.
- Published
- 2020
20. Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies
- Author
-
Hyun Min Kang, Christopher R. Gignoux, Carlos Bustamante, Ryan P. Welch, Daniel Taliun, Alicia R. Martin, Christopher S. Carlson, Gonçalo R. Abecasis, Genevieve L. Wojcik, Suyash Shringarpure, Eimear E. Kenny, Michael Boehnke, and Christian Fuchsberger
- Subjects
0301 basic medicine ,Population ,Single-nucleotide polymorphism ,Genomics ,Investigations ,QH426-470 ,Biology ,computer.software_genre ,Polymorphism, Single Nucleotide ,Linkage Disequilibrium ,03 medical and health sciences ,array design ,Ethnicity ,Genetics ,Humans ,Statistical Genetics ,Selection, Genetic ,1000 Genomes Project ,education ,Molecular Biology ,Genetic Association Studies ,Genetics (clinical) ,Imputation ,Genetic association ,education.field_of_study ,Models, Genetic ,Computational Biology ,Reproducibility of Results ,Tag SNP ,tag SNPs ,Genetics, Population ,030104 developmental biology ,Pairwise comparison ,Data mining ,Databases, Nucleic Acid ,computer ,Imputation (genetics) ,Genome-Wide Association Study - Abstract
The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r2 at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5–3.1% for an array of one million sites and 0.7–7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
- Published
- 2018
21. SYMPTOM-LEVEL GENETIC MODELLING OF DEPRESSION AND ANXIETY
- Author
-
Enda M. Byrne, Jue-Sheng Ong, Suyash Shringarpure, Stuart MacGregor, Christel M. Middeldorp, Nicholas G. Martin, Zachary Gerring, Sarah E. Medland, Andrew D. Grotzinger, Eske M. Derks, Jackson G. Thorp, Jiyuan An, Adrian I. Campos, and Wei Wang
- Subjects
Pharmacology ,Psychiatry and Mental health ,Neurology ,business.industry ,Medicine ,Anxiety ,Pharmacology (medical) ,Neurology (clinical) ,medicine.symptom ,business ,Biological Psychiatry ,Depression (differential diagnoses) ,Clinical psychology - Published
- 2021
22. Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data
- Author
-
Rasika A. Mathias, Timothy D. O’Connor, Carlos Bustamante, Zachary A. Szpiech, Kathleen C. Barnes, Ryan D. Hernandez, Margaret A. Taub, Suyash Shringarpure, Raul Torres, and Francisco M. De La Vega
- Subjects
0301 basic medicine ,Statistics and Probability ,dbSNP ,Genotype ,Genotyping Techniques ,Computer science ,Genomics ,computer.software_genre ,Biochemistry ,Genome ,Polymorphism, Single Nucleotide ,03 medical and health sciences ,Humans ,International HapMap Project ,Molecular Biology ,Allele frequency ,Whole genome sequencing ,Massive parallel sequencing ,Whole Genome Sequencing ,Genome, Human ,High-Throughput Nucleotide Sequencing ,Original Papers ,Computer Science Applications ,Data Accuracy ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Human genome ,Data mining ,computer ,Sequence Analysis - Abstract
Motivation Variant calling from next-generation sequencing (NGS) data is susceptible to false positive calls due to sequencing, mapping and other errors. To better distinguish true from false positive calls, we present a method that uses genotype array data from the sequenced samples, rather than public data such as HapMap or dbSNP, to train an accurate classifier using Random Forests. We demonstrate our method on a set of variant calls obtained from 642 African-ancestry genomes from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA), sequenced to high depth (30X). Results We have applied our classifier to compare call sets generated with different calling methods, including both single-sample and multi-sample callers. At a False Positive Rate of 5%, our method determines true positive rates of 97.5%, 95% and 99% on variant calls obtained using Illuminas single-sample caller CASAVA, Real Time Genomics multisample variant caller, and the GATK UnifiedGenotyper, respectively. Since NGS sequencing data may be accompanied by genotype data for the same samples, either collected concurrent to sequencing or from a previous study, our method can be trained on each dataset to provide a more accurate computational validation of site calls compared to generic methods. Moreover, our method allows for adjustment based on allele frequency (e.g. a different set of criteria to determine quality for rare versus common variants) and thereby provides insight into sequencing characteristics that indicate call quality for variants of different frequencies. Availability and Implementation Code is available on Github at: https://github.com/suyashss/variant_validation Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2016
23. Characterization of Prevalence and Health Consequences of Uniparental Disomy in Four Million Individuals from the General Population
- Author
-
Sarah L. Elson, Joyce Y. Tung, Yunxuan Jiang, Kimberly F. McManus, Joanna L. Mountain, Nadia K. Litterman, Chao Tian, Karen E. Huber, Michelle Agee, G. David Poznik, Sohini Ramachandran, Keng-Han Lin, Pierre Fontanillas, J. Fah Sathirapongsasuti, Adam Auton, Elizabeth S. Noblin, Jennifer C. McCreight, Robert K. Bell, Matthew H. McIntyre, Xin Wang, Barry W. Hicks, Aaron Kleinman, Vladimir Vacic, Samuel Pattillo Smith, Suyash Shringarpure, Nicholas A. Furlotte, Ethan M. Jewett, Katarzyna Bryc, Carrie A.M. Northover, David A. Hinds, Priyanka Nakka, Janie F. Shelton, Steven J. Pitts, and Anne H. O’Donnell-Luria
- Subjects
0301 basic medicine ,Male ,congenital, hereditary, and neonatal diseases and abnormalities ,uniparental disomy ,Population ,Aneuploidy ,030105 genetics & heredity ,Runs of Homozygosity ,Biology ,Identity by descent ,Polymorphism, Single Nucleotide ,Article ,03 medical and health sciences ,Genomic Imprinting ,Genetics ,medicine ,Prevalence ,Humans ,education ,Genetics (clinical) ,education.field_of_study ,runs of homozygosity ,Homozygote ,medicine.disease ,Uniparental disomy ,3. Good health ,030104 developmental biology ,Phenotype ,Nondisjunction ,Autism ,Female ,identity-by-descent ,Chromosome 22 - Abstract
Meiotic nondisjunction and resulting aneuploidy can lead to severe health consequences in humans. Aneuploidy rescue can restore euploidy but may result in uniparental disomy (UPD), the inheritance of both homologs of a chromosome from one parent with no representative copy from the other. Current understanding of UPD is limited to ∼3,300 case subjects for which UPD was associated with clinical presentation due to imprinting disorders or recessive diseases. Thus, the prevalence of UPD and its phenotypic consequences in the general population are unknown. We searched for instances of UPD across 4,400,363 consented research participants from the personal genetics company 23andMe, Inc., and 431,094 UK Biobank participants. Using computationally detected DNA segments identical-by-descent (IBD) and runs of homozygosity (ROH), we identified 675 instances of UPD across both databases. We estimate that UPD is twice as common as previously thought, and we present a machine-learning framework to detect UPD using ROH. While we find a nominally significant association between UPD of chromosome 22 and autism risk, we do not find significant associations between UPD and deleterious traits in the 23andMe database.
- Published
- 2019
24. Shared genetic background between children and adults with attention deficit/hyperactivity disorder
- Author
-
Elizabeth S. Noblin, Joanna L. Mountain, Michelle Agee, Sarah L. Elson, Angélica Salatino-Oliveira, Barbara Franke, Diego L. Rovaris, Josep Antoni Ramos-Quiroga, Vladimir Vacic, Tatyana Strekalova, Claiton H.D. Bau, Christian Fadeuilhe, Peter Almos, Mara H. Hutz, David A. Hinds, Henrik Larsson, Jonna Kuntsi, Olga Rivero, Anders D. Børglum, Laura Vilar, Lorena Arribas, Catherine H. Wilson, Marieke Klein, Adam Auton, Anke Hinney, Katarzyna Bryc, Mark A. Bellgrove, Christie L. Burton, J. Fah Sathirapongsasuti, Olga V. Sazonova, Luis Augusto Rohde, Edmund J.S. Sonuga-Barke, Andreas Reif, Ted Reichborn-Kjennerud, Sarah Kittel-Schneider, Anne Halmøy, Bru Cormand, Miquel Casas, Joyce Y. Tung, Robert D. Oades, Matthew H. McIntyre, Nicholas A. Furlotte, Janie F. Shelton, Alysa E. Doyle, Iris Garcia-Martínez, Marta Ribasés, Johannes Hebebrand, Søren Dalsgaard, Tetyana Zayats, María Soler Artigas, Carrie A.M. Northover, Steven J. Pitts, Ole A. Andreassen, Chao Tian, Karen E. Huber, Josephine Elia, Montserrat Corrales, Jennifer C. McCreight, Benjamin M. Neale, Aribert Rothenberger, Sandra K. Loo, Cristina Sánchez-Mora, Stefan Johansson, Hakon Hakonarson, Christian Jacob, Mireia Pagerols, Jan Haavik, Joseph Biederman, Paula Rovira, Rosa Bosch, Alejandro Arias-Vasquez, Stephen V. Faraone, Catharina A. Hartman, Vanesa Richarte, Oliver Grimm, Irwin D. Waldman, Heike Weber, Martine Hoogman, Per M. Knappskog, Aaron Kleinman, Suyash Shringarpure, Jan K. Buitelaar, Philip Asherson, James J. McGough, Emma Sprooten, Russell Schachar, Ditte Demontis, Klaus-Peter Lesch, Nadia K. Litterman, Gemma Martín, Xin Wang, Evgeniy Svirin, Babak Alipanahi, Ziarih Hawi, Tobias Banaschewski, Bruna Santos da Silva, Robert K. Bell, Eugenio H. Grevet, Pierre Fontanillas, Nina Roth Mota, Jennifer Crosbie, Astri J. Lundervold, Psychiatrie & Neuropsychologie, and RS: MHeNs - R3 - Neuroscience
- Subjects
Persistence (psychology) ,Trastorns per dèficit d'atenció amb hiperactivitat en els infants ,SYMPTOMS ,LD SCORE REGRESSION ,Stress-related disorders Donders Center for Medical Neuroscience [Radboudumc 13] ,CHILDHOOD ,Genome-wide association study ,Attention deficit disorder with hyperactivity in children ,CA2+-DEPENDENT ACTIVATOR PROTEIN ,0302 clinical medicine ,Neurodevelopmental disorder ,Child ,SECRETION 2 ,0303 health sciences ,HERITABILITY ,220 Statistical Imaging Neuroscience ,Psychiatry and Mental health ,Phenotype ,Trastorns per dèficit d'atenció amb hiperactivitat en els adults ,medicine.symptom ,Genetic Background ,Clinical psychology ,Adult ,DEFICIT HYPERACTIVITY DISORDER ,Impulsivity ,behavioral disciplines and activities ,Genetic correlation ,Article ,03 medical and health sciences ,mental disorders ,medicine ,Humans ,ADHD ,Attention deficit hyperactivity disorder ,GENOME-WIDE ASSOCIATION ,METAANALYSIS ,030304 developmental biology ,Genetic association ,Pharmacology ,Neurodevelopmental disorders Donders Center for Medical Neuroscience [Radboudumc 7] ,business.industry ,medicine.disease ,Genetic architecture ,Attention Deficit Disorder with Hyperactivity ,Impulsive Behavior ,Attention deficit ,Genetic markers ,Attention deficit disorder with hyperactivity in adults ,business ,030217 neurology & neurosurgery ,Genome-Wide Association Study - Abstract
Altres ajuts: VR has served on the speakers for Eli Lilly, Rubio, and Shire in the last 5 years. She has received travel awards from Eli Lilly and Co. and Shire for participating in psychiatric meetings. The ADHD Program has received unrestricted educational and research support from Eli Lilly and Co., Janssen-Cilag, Shire, Rovi, Psious, and Laboratorios Rubió in the past 2 years. MC received travel awards for taking part in psychiatric meetings from Shire. CF received travel awards for taking part in psychiatric meetings from Shire and Lundbeck. GEM received travel awards for taking part in psychiatric meetings from Shire. EJSS-B received speaker fees, consultancy, research funding, and conference support from Shire Pharma. Consultancy from Neurotech Solutions, Copenhagen University and Berhanderling, Skolerne & KU Leuven. Book royalties from OUP and Jessica Kingsley. Financial support-Arhus University and Ghent University for visiting Professorship. Editor-in-Chief JCPP-supported by a buy-out of time to University of Southampton and personal Honorarium. King's College London received payments for work conducted by PA: consultancy for Shire, Eli Lilly, Novartis, and Lundbeck; educational and/or research awards from Shire, Eli Lilly, Novartis, Vifor Pharma, GW Pharma, and QbTech; speaker at events sponsored by Shire, Eli Lilly, Janssen-Cilag, and Novartis. JKB has been in the past 3 years a consultant to/member of advisory board of and/or speaker for Shire, Roche, Medice, and Servier. He is not an employee of any of these companies, and not a stock shareholder of any of these companies. He has no other financial or material support, including expert testimony, patents, royalties. In the past year, SVF received income, potential income, travel expenses continuing education support and/or research support from Tris, Otsuka, Arbor, Ironshore, Shire, Akili Interactive Labs, Enzymotec, Sunovion, Supernus, and Genomind. With his institution, he has US patent US20130217707 A1 for the use of sodium-hydrogen exchange inhibitors in the treatment of ADHD. He also receives royalties from books published by Guilford Press: Straight Talk about Your Child's Mental Health, Oxford University Press: Schizophrenia: The Facts and Elsevier: ADHD: Non-Pharmacologic Interventions. He is principal investigator of www.adhdinadults.com. JK has given talks at educational events sponsored by Medice; all funds are received by King's College London and used for studies of ADHD. HL has served as a speaker for Evolan Pharma and Shire and has received research grants from Shire; all outside the submitted work. K-PL served as a speaker for Eli Lilly and received research support from Medice, and travel support from Shire, all outside the submitted work. LAR reported receiving honoraria, serving on the speakers' bureau/advisory board, and/or acting as a consultant for Eli Lilly, Janssen-Cilag, Novartis, and Shire in the last 3 years; receiving authorship royalties from Oxford Press and ArtMed; and receiving travel awards from Shire for his participation in the 2015 WFADHD meetings and from Novartis to take part of the 2016 AACAP meeting. The ADHD and juvenile bipolar disorder outpatient programs chaired by him received unrestricted educational and research support from the following pharmaceutical companies in the last 3 years: Janssen-Cilag, Novartis, and Shire. MC has received travel grants and research support from Eli Lilly and Co., Janssen-Cilag, Shire, and Lundbeck and served as consultant for Eli Lilly and Co., Janssen-Cilag, Shire, and Lundbeck. BF has received educational speaking fees from Medice and Shire. JAR-Q was on the speakers' bureau and/or acted as consultant for Eli Lilly, Janssen-Cilag, Novartis, Shire, Lundbeck, Almirall, Braingaze, Sincrolab, Medice, and Rubió in the last 5 years. He also received travel awards (air tickets + hotel) for taking part in psychiatric meetings from Janssen-Cilag, Rubió, Shire, Medice, and Eli- Lilly. The Department of Psychiatry chaired by him received unrestricted educational and research support from the following companies in the last 5 years: Eli Lilly, Lundbeck, Janssen- Cilag, Actelion, Shire, Ferrer, Oryzon, Roche, Psious, and Rubió. The other authors have nothing to disclose. All members of the 23andMe Research Team are current or former employees of 23andMe, Inc. and hold stock or stock options in 23andMe. Authors of the ADHD Working group of the PGC that participated in this study: Catharina A. Hartman, Ziarih Hawi, Jennifer Crosbie, Sandra Loo, Josephine Elia, Russell Schachar, Christie Burton, Ted Reichborn-Kjennerud, Aribert Rothenberger, Søren Dalsgaard, Irwin Waldman, Mark Bellgrove, Hakon Hakonarson, Johannes Hebebrand, Anke Hinney, and Robert Oades have nothing to disclose. Joseph Biederman 2019-2020: he received research support from Genentech, Headspace Inc., Lundbeck AS, Neurocentria Inc, Pfizer Pharmaceuticals, Roche TCRC Inc., Shire Pharmaceuticals Inc., Sunovion Pharmaceuticals Inc., and Tris. He was a consultant for Akili, Avekshan, Jazz Pharma, and Shire/Takeda. Through MGH CTNI, he participated in a scientific advisory board for Supernus. Dr Biederman's program has received departmental royalties from a copyrighted rating scale used for ADHD diagnoses, paid by Bracket Global, Ingenix, Prophase, Shire, Sunovion, and Theravance; these royalties were paid to the Department of Psychiatry at MGH. Tobias Banaschewski served in an advisory or consultancy role for Lundbeck, Medice, Neurim Pharmaceuticals, Oberberg GmbH, Shire, and Infectopharm. He received conference support or speaker's fee by Lilly, Medice, and Shire. He received royalties from Hogrefe, Kohlhammer, CIP Medien, and Oxford University Press. James McGough is an expert testimony for Eli Lilly; DSMB for Sunovion. Benjamin Neale is a member of the scientific advisory board at Deep Genomics and consultant for Camp4 Therapeutics, Takeda Pharmaceutical, and Biogen. Ole Andreas Andreassen received speaker's honorarium from Lundbeck, and is a consultant for HealthLytix. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 602805 (Aggressotype) as well as from the European Union H2020 Programme (H2020/2014-2020). The work was also supported by the ECNP Network "ADHD across the Lifespan" (https://www.ecnp.eu/research-innovation/ECNP-networks/List-ECNP-Networks/ADHD.aspx). Over the course of this investigation, PR is a recipient of a pre-doctoral fellowship from the Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR), Generalitat de Catalunya, Spain. The iPSYCH project is funded by the Lundbeck Foundation (grant nos. R102-A9118 and R155-2014-1724) and the universities, and university hospitals of Aarhus and Copenhagen. ADB and NRM's work is also supported by the EU's Horizon 2020 programme. Data handling and analysis was supported by NIMH (1U01MH109514-01 to Michael O'Donovan and ADB). CSM is a recipient of a Sara Borrell contract from the Instituto de Salud Carlos III, Ministerio de Economía, Industria y Competitividad, Spain. K-PL and his team are supported by the Deutsche Forschungsgemeinschaft (DFG: CRU 125, CRC TRR 58 A1/A5, no. 44541416), ERA-Net NEURON/RESPOND, no. 01EW1602B, ERA-Net NEURON/DECODE, no. 01EW1902, and 5-100 Russian Academic Excellence Project. JH thanks Stiftelsen K.G. Jebsen, University of Bergen, the Western Norwegian Health Authorities (Helse Vest). HL thanks the Swedish research council. BC received financial support from the Spanish "Ministerio de Economía y Competitividad" and AGAUR. MSA is a recipient of a contract from the Biomedical Network Research Center on Mental Health (CIBERSAM), Madrid, Spain. MR is a recipient of a Miguel de Servet contract from the Instituto de Salud Carlos III, Spain. The research leading to these results has received funding from the Instituto de Salud Carlos III , and cofinanced by the European Regional Development Fund (ERDF), from the Pla estratègic de recerca i innovació en salut (PERIS); Generalitat de Catalunya, from "la Marató de TV3" (092330/31) and from the Agència de Gestió d'Ajuts Universitaris i de Recerca-AGAUR, Generalitat de Catalunya. ES's work is supported by a personal Hypatia grant from the Radboud University Medical Center. MH received a Veni grant from of the Netherlands Organization for Scientific Research (NWO, grant number 91619115). The NeuroIMAGE study was supported by NIH Grant R01MH62873 (to SVF), NWO Large Investment Grant 1750102007010 (to JKB), ZonMW grant 60-60600-97-193, NWO grants 056-13-015 and 433-09-242, and matching grants from Radboud University Nijmegen Medical Center, University Medical Center Groningen and Accare, and Vrije Universiteit Amsterdam. Organization for Scientific Research (NWO; grant 016-130-669). BF and MK's work is supported by the Dutch National Science Agenda (NWA) for the NeurolabNL project (grant 400-17-602). This paper represents independent research part funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London. BF received additional funding from a personal Vici grant of the Dutch. Attention deficit/hyperactivity disorder (ADHD) is a common neurodevelopmental disorder characterized by age-inappropriate symptoms of inattention, impulsivity, and hyperactivity that persist into adulthood in the majority of the diagnosed children. Despite several risk factors during childhood predicting the persistence of ADHD symptoms into adulthood, the genetic architecture underlying the trajectory of ADHD over time is still unclear. We set out to study the contribution of common genetic variants to the risk for ADHD across the lifespan by conducting meta-analyses of genome-wide association studies on persistent ADHD in adults and ADHD in childhood separately and jointly, and by comparing the genetic background between them in a total sample of 17,149 cases and 32,411 controls. Our results show nine new independent loci and support a shared contribution of common genetic variants to ADHD in children and adults. No subgroup heterogeneity was observed among children, while this group consists of future remitting and persistent individuals. We report similar patterns of genetic correlation of ADHD with other ADHD-related datasets and different traits and disorders among adults, children, and when combining both groups. These findings confirm that persistent ADHD in adults is a neurodevelopmental disorder and extend the existing hypothesis of a shared genetic architecture underlying ADHD and different traits to a lifespan perspective.
- Published
- 2019
25. The inference of sex-biased human demography from whole-genome data
- Author
-
Sohini Ramachandran, Shaila Musharoff, Carlos Bustamante, and Suyash Shringarpure
- Subjects
Male ,Cancer Research ,Test Statistics ,Population genetics ,QH426-470 ,Geographical Locations ,0302 clinical medicine ,Mathematical and Statistical Techniques ,Effective population size ,Statistics ,Genetics (clinical) ,0303 health sciences ,education.field_of_study ,Genome ,Sex Chromosomes ,Chromosome Biology ,Population size ,Simulation and Modeling ,Autosomes ,X Chromosomes ,Europe ,Physical Sciences ,Female ,Research Article ,Demographic history ,Population Size ,Population ,Biology ,Research and Analysis Methods ,Chromosomes ,03 medical and health sciences ,Bias ,Population Metrics ,Genetics ,Humans ,1000 Genomes Project ,Selection, Genetic ,Statistical Methods ,education ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Selection (genetic algorithm) ,030304 developmental biology ,Demography ,Population Density ,Chromosomes, Human, X ,Evolutionary Biology ,Models, Genetic ,Whole Genome Sequencing ,Population Biology ,Genetic Variation ,Biology and Life Sciences ,Sequence Analysis, DNA ,Cell Biology ,Background selection ,Genetics, Population ,Genetic Loci ,People and Places ,030217 neurology & neurosurgery ,Mathematics ,Population Genetics - Abstract
Sex-biased demographic events (“sex-bias”) involve unequal numbers of females and males. These events are typically inferred from the relative amount of X-chromosomal to autosomal genetic variation and have led to conflicting conclusions about human demographic history. Though population size changes alter the relative amount of X-chromosomal to autosomal genetic diversity even in the absence of sex-bias, this has generally not been accounted for in sex-bias estimators to date. Here, we present a novel method to identify sex-bias from genetic sequence data that models population size changes and estimates the female fraction of the effective population size during each time epoch. Compared to recent sex-bias inference methods, our approach can detect sex-bias that changes on a single population branch without requiring data from an outgroup or knowledge of divergence events. When applied to simulated data, conventional sex-bias estimators are biased by population size changes, especially recent growth or bottlenecks, while our estimator is unbiased. We next apply our method to high-coverage exome data from the 1000 Genomes Project and estimate a male bias in Yorubans (47% female) and Europeans (44%), possibly due to stronger background selection on the X chromosome than on the autosomes. Finally, we apply our method to the 1000 Genomes Project Phase 3 high-coverage Complete Genomics whole-genome data and estimate a female bias in Yorubans (63% female), Europeans (84%), Punjabis (82%), as well as Peruvians (56%), and a male bias in the Southern Han Chinese (45%). Our method additionally identifies a male-biased migration out of Africa based on data from Europeans (20% female). Our results demonstrate that modeling population size change is necessary to estimate sex-bias parameters accurately. Our approach gives insight into signatures of sex-bias in sexual species, and the demographic models it produces can serve as more accurate null models for tests of selection., Author summary Sex-biased demographic events involve unequal numbers of females and males, and is referred to as “sex-bias”. In humans, short-range migrations (e.g., due to marriage practices) are known to be sex-biased, and some long-range migrations, such as the one out of Africa, are hypothesized to be sex-biased. The recent availability of large-scale genomic sequencing data provides a unique opportunity to study sex-bias in human populations. However, existing sex-bias methods do not account for population size changes, like expansions and bottlenecks, or can only estimate a single sex-bias parameter on a population branch, which can lead to incorrect conclusions. We developed a sex-bias method which explicitly models population size changes, and we show that it outperforms competing methods on simulated data. When applied to human genetic data, our method identifies an overall female sex-bias in globally-distributed populations and a male-biased bottleneck in Europeans. Our method can also be used to assess sex-bias in other sexual species.
- Published
- 2018
26. Spectrum and prevalence of genetic predisposition in medulloblastoma:a retrospective genetic study and prospective validation in a clinical trial cohort
- Author
-
David A. Solomon, Carlos Bustamante, Michael A. Grotzer, Richard J. Cohn, Martin Röösli, Jennifer A. Chan, Geoffrey McCowage, Daniel C. Bowers, Joachim Weischenfeldt, Pablo Hernáiz Driever, Tobey J. MacDonald, Maia Segura-Wang, Anne Bendel, Vijay Ramaswamy, Michael D. Taylor, Till Milde, Ivo Buchhalter, Stefan M. Pfister, Tobias Rausch, Tina Veje Andersen, Susanne N. Groebner, Suyash Shringarpure, Stefan Rutkowski, Kristina Kjaerheim, Léa Guerrini-Rousseau, Marina Ryzhova, Kerstin Grund, Arie Perry, Kristian W. Pajtler, Wiesława Grajkowska, Scott L. Pomeroy, Daniel W. Fults, Jinghui Zhang, Christoffer Johansen, Jan O. Korbel, Stephan Frank, Claus R. Bartram, Marcel Kool, Birgitta Lannering, Tenley C. Archer, Ho Keung Ng, Nada Jabado, David T.W. Jones, Wolfram Scheurlen, Young Shin Ra, Andrey Korshunov, Elizabeth S. Duke, Camelia M. Monoranu, Finn Wesenberg, Christian Lawerenz, Laurence Brugières, Lukas Chavez, Redmond Shelagh, Christian P. Kratz, Christian Sutter, David Samuel, Giles W. Robinson, David Sumerauer, Paul A. Northcott, Peter Hauser, Michael Hain, Amar Gajjar, Joachim Schüz, Roland Eils, Balca R. Mardin, Murali Chintagumpala, Peter Lichter, Katja von Hoff, Gudrun Fleischhack, Pascale Varlet, Sebastian Brabetz, A. Sorana Morrissy, Richard J. Gilbertson, Dominik Sturm, Xin Zhou, Aurélie Ernst, Marco A. Marra, Maria Feychting, Karel Zitterbart, Thomas Zichner, Tone Eggen, David Malkin, Claudia E. Kuehni, Tim Hassall, Sebastian M. Waszak, Francisco M. De La Vega, Cristina Baciu, Gilbertson, Richard [0000-0001-7539-9472], and Apollo - University of Cambridge Repository
- Subjects
0301 basic medicine ,Oncology ,Male ,Heredity ,DNA Mutational Analysis ,Medizin ,Whole Exome Sequencing ,0302 clinical medicine ,Risk Factors ,Models ,Prevalence ,2.1 Biological and endogenous factors ,Prospective Studies ,Aetiology ,Prospective cohort study ,Child ,Cancer ,Pediatric ,Tumor ,Progression-Free Survival ,3. Good health ,Pedigree ,Phenotype ,Child, Preschool ,030220 oncology & carcinogenesis ,Female ,Adult ,medicine.medical_specialty ,Adolescent ,Pediatric Cancer ,PALB2 ,Genetic counseling ,Oncology and Carcinogenesis ,610 Medicine & health ,Article ,03 medical and health sciences ,Young Adult ,Germline mutation ,Rare Diseases ,Genetic ,Predictive Value of Tests ,360 Social problems & social services ,Internal medicine ,Exome Sequencing ,Biomarkers, Tumor ,Genetic predisposition ,medicine ,Genetics ,Humans ,Genetic Predisposition to Disease ,Genetic Testing ,Oncology & Carcinogenesis ,Preschool ,Cerebellar Neoplasms ,Germ-Line Mutation ,Retrospective Studies ,Medulloblastoma ,Models, Genetic ,business.industry ,Gene Expression Profiling ,Human Genome ,Reproducibility of Results ,Infant ,Retrospective cohort study ,DNA Methylation ,medicine.disease ,Brain Disorders ,Brain Cancer ,030104 developmental biology ,business ,Transcriptome ,Biomarkers - Abstract
Background: Medulloblastoma is associated with rare hereditary cancer predisposition syndromes; however, consensus medulloblastoma predisposition genes have not been defined and screening guidelines for genetic counselling and testing for paediatric patients are not available. We aimed to assess and define these genes to provide evidence for future screening guidelines. Methods: In this international, multicentre study, we analysed patients with medulloblastoma from retrospective cohorts (International Cancer Genome Consortium [ICGC] PedBrain, Medulloblastoma Advanced Genomics International Consortium [MAGIC], and the CEFALO series) and from prospective cohorts from four clinical studies (SJMB03, SJMB12, SJYC07, and I-HIT-MED). Whole-genome sequences and exome sequences from blood and tumour samples were analysed for rare damaging germline mutations in cancer predisposition genes. DNA methylation profiling was done to determine consensus molecular subgroups: WNT (MBWNT), SHH (MBSHH), group 3 (MBGroup3), and group 4 (MBGroup4). Medulloblastoma predisposition genes were predicted on the basis of rare variant burden tests against controls without a cancer diagnosis from the Exome Aggregation Consortium (ExAC). Previously defined somatic mutational signatures were used to further classify medulloblastoma genomes into two groups, a clock-like group (signatures 1 and 5) and a homologous recombination repair deficiency-like group (signatures 3 and 8), and chromothripsis was investigated using previously established criteria. Progression-free survival and overall survival were modelled for patients with a genetic predisposition to medulloblastoma. Findings: We included a total of 1022 patients with medulloblastoma from the retrospective cohorts (n=673) and the four prospective studies (n=349), from whom blood samples (n=1022) and tumour samples (n=800) were analysed for germline mutations in 110 cancer predisposition genes. In our rare variant burden analysis, we compared these against 53 105 sequenced controls from ExAC and identified APC, BRCA2, PALB2, PTCH1, SUFU, and TP53 as consensus medulloblastoma predisposition genes according to our rare variant burden analysis and estimated that germline mutations accounted for 6% of medulloblastoma diagnoses in the retrospective cohort. The prevalence of genetic predispositions differed between molecular subgroups in the retrospective cohort and was highest for patients in the MBSHH subgroup (20% in the retrospective cohort). These estimates were replicated in the prospective clinical cohort (germline mutations accounted for 5% of medulloblastoma diagnoses, with the highest prevalence [14%] in the MBSHH subgroup). Patients with germline APC mutations developed MBWNT and accounted for most (five [71%] of seven) cases of MBWNT that had no somatic CTNNB1 exon 3 mutations. Patients with germline mutations in SUFU and PTCH1 mostly developed infant MBSHH. Germline TP53 mutations presented only in childhood patients in the MBSHH subgroup and explained more than half (eight [57%] of 14) of all chromothripsis events in this subgroup. Germline mutations in PALB2 and BRCA2 were observed across the MBSHH, MBGroup3, and MBGroup4 molecular subgroups and were associated with mutational signatures typical of homologous recombination repair deficiency. In patients with a genetic predisposition to medulloblastoma, 5-year progression-free survival was 52% (95% CI 40–69) and 5-year overall survival was 65% (95% CI 52–81); these survival estimates differed significantly across patients with germline mutations in different medulloblastoma predisposition genes. Interpretation: Genetic counselling and testing should be used as a standard-of-care procedure in patients with MBWNT and MBSHH because these patients have the highest prevalence of damaging germline mutations in known cancer predisposition genes. We propose criteria for routine genetic screening for patients with medulloblastoma based on clinical and molecular tumour characteristics. Funding: German Cancer Aid; German Federal Ministry of Education and Research; German Childhood Cancer Foundation (Deutsche Kinderkrebsstiftung); European Research Council; National Institutes of Health; Canadian Institutes for Health Research; German Cancer Research Center; St Jude Comprehensive Cancer Center; American Lebanese Syrian Associated Charities; Swiss National Science Foundation; European Molecular Biology Organization; Cancer Research UK; Hertie Foundation; Alexander and Margaret Stewart Trust; V Foundation for Cancer Research; Sontag Foundation; Musicians Against Childhood Cancer; BC Cancer Foundation; Swedish Council for Health, Working Life and Welfare; Swedish Research Council; Swedish Cancer Society; the Swedish Radiation Protection Authority; Danish Strategic Research Council; Swiss Federal Office of Public Health; Swiss Research Foundation on Mobile Communication; Masaryk University; Ministry of Health of the Czech Republic; Research Council of Norway; Genome Canada; Genome BC; Terry Fox Research Institute; Ontario Institute for Cancer Research; Pediatric Oncology Group of Ontario; The Family of Kathleen Lorette and the Clark H Smith Brain Tumour Centre; Montreal Children's Hospital Foundation; The Hospital for Sick Children: Sonia and Arthur Labatt Brain Tumour Research Centre, Chief of Research Fund, Cancer Genetics Program, Garron Family Cancer Centre, MDT's Garron Family Endowment; BC Childhood Cancer Parents Association; Cure Search Foundation; Pediatric Brain Tumor Foundation; Brainchild; and the Government of Ontario.
- Published
- 2018
27. Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference
- Author
-
Suyash Shringarpure and Xing, Eric P
- Subjects
FOS: Psychology ,170203 Knowledge Representation and Machine Learning - Abstract
Population stratification is an important task in genetic analyses. It provides information about the ancestry of individuals and can be an important confounder in genome-wide association studies. Public genotyping projects have made a large number of datasets available for study. However, practical constraints dictate that of a geographical/ethnic population, only a small number of individuals are genotyped. The resulting data are a sample from the entire population. If the distribution of sample sizes is not representative of the populations being sampled, the accuracy of population stratification analyses of the data could be affected. We attempt to understand the effect of biased sampling on the accuracy of population structure analysis and individual ancestry recovery. We examined two commonly used methods for analyses of such datasets, ADMIXTURE and EIGENSOFT, and found that the accuracy of recovery of population structure is affected to a large extent by the sample used for analysis and how representative it is of the underlying populations. Using simulated data and real genotype data from cattle, we show that sample selection bias can affect the results of population structure analyses. We develop a mathematical framework for sample selection bias in models for population structure and also proposed a correction for sample selection bias using auxiliary information about the sample. We demonstrate that such a correction is effective in practice using simulated and real data.
- Published
- 2018
- Full Text
- View/download PDF
28. Multiethnic GWAS Reveals Polygenic Architecture of Earlobe Attachment
- Author
-
Joanna L. Mountain, Jenna C. Carlson, Steven J. Pitts, Chao Tian, Bethann S. Hromatka, Giovanni Poletti, Carrie A.M. Northover, Nadia Litterman, Eleanor Feingold, Timothy C. Cox, Samuel Canizales-Quinteros, Francisco Rothhammer, Andres Ruiz-Linares, Catherine H. Wilson, J. Fah Sathirapongsasuti, Jacqueline T. Hecht, Pierre Fontanillas, Suyash Shringarpure, Sijia Wang, Jasmien Roosenboom, Sarah L. Elson, Elizabeth J. Leslie, Katarzyna Bryc, Ekaterina Orlova, Anan Ding, Seth M. Weinberg, Vladimir Vacic, Lina M. Moreno, Mary L. Marazita, Paige E. Pfeffer, Robert K. Bell, Olga V. Sazonova, George L. Wehby, Elizabeth S. Noblin, Rolando González-José, Matthew H. McIntyre, David A. Hinds, Li Jin, Adam Auton, Myoung Keun Lee, Janie F. Shelton, Nicholas A. Furlotte, Christopher A. Wollenschlaeger, Lavinia Schuler-Faccini, John R. Shaffer, Karen E. Huber, Yajun Yang, Gabriel Bedoya, Maria Cátira Bortolini, Babak Alipanahi, Carla Gallo, Jinxi Li, Aaron Kleinman, Michelle Agee, Kaustabh Adhikari, Joyce Y. Tung, 23andMe Research Team, Anthropologie bio-culturelle, Droit, Ethique et Santé (ADES), and Aix Marseille Université (AMU)-EFS ALPES MEDITERRANEE-Centre National de la Recherche Scientifique (CNRS)
- Subjects
epistasis ,0301 basic medicine ,Multifactorial Inheritance ,Candidate gene ,[SHS.ANTHRO-BIO]Humanities and Social Sciences/Biological anthropology ,unattached earlobe ,Receptors, G-Protein-Coupled - genetics ,Genome-wide association study ,030105 genetics & heredity ,Receptors, G-Protein-Coupled ,purl.org/becyt/ford/1 [https] ,Mice ,MULTIGENIC ,pinna ,Branchial Region/anatomy & histology ,Genotype ,Quantitative Trait Loci - genetics ,Child ,Receptores Acoplados a Proteínas G - genética ,attached earlobe ,Genetics (clinical) ,ComputingMilieux_MISCELLANEOUS ,Sitios de Carácter Cuantitativo - genética ,Genetics ,COMPLEX TRAIT GENETICS ,Edar Receptor ,Ear ,Middle Aged ,UNATTACHED EARLOBE ,Edar Receptor/genetics ,DNA-Binding Proteins ,medicine.anatomical_structure ,Región Branquial - anatomía e histología ,Child, Preschool ,Mitochondrial Proteins/genetics ,Proteins/genetics ,Cohort ,Trait ,CIENCIAS NATURALES Y EXACTAS ,Ribosomal Proteins ,Adult ,Branchial Region - anatomy & histology ,Adolescent ,Ear/anatomy & histology ,PHARYNGEAL ARCH ,Otras Ciencias Biológicas ,Quantitative Trait Loci ,Transcription Factors/genetics ,PINNA ,PAX9 Transcription Factor - genetics ,Quantitative trait locus ,Biology ,Article ,Estudio de Asociación del Genoma Completo ,Mitochondrial Proteins ,Ciencias Biológicas ,Young Adult ,03 medical and health sciences ,medicine ,Animals ,Humans ,EPISTASIS ,Oído - anatomía e histología ,purl.org/becyt/ford/1.6 [https] ,Multifactorial Inheritance/genetics ,Earlobe ,genome-wide association study ,Proteínas de Unión al ADN - genética ,complex trait genetics ,Proteins ,GENOME-WIDE ASSOCIATION STUDY ,TRANS-ETHNIC ,trans-ethnic ,pharyngeal arch ,Factor de Transcripción PAX9 - genética ,purl.org/pe-repo/ocde/ford#3.01.02 [https] ,DNA-Binding Proteins - genetics ,ATTACHED EARLOBE ,Branchial Region ,030104 developmental biology ,Ear - anatomy & histology ,PAX9 Transcription Factor/genetics ,Ribosomal Proteins/genetics ,Epistasis ,PAX9 Transcription Factor ,multigenic ,Quantitative Trait Loci/genetics ,Receptors, G-Protein-Coupled/genetics ,Genotipo ,DNA-Binding Proteins/genetics ,Transcription Factors - Abstract
The genetic basis of earlobe attachment has been a matter of debate since the early 20th century, such that geneticists argue both for and against polygenic inheritance. Recent genetic studies have identified a few loci associated with the trait, but large-scale analyses are still lacking. Here, we performed a genome-wide association study of lobe attachment in a multiethnic sample of 74,660 individuals from four cohorts (three with the trait scored by an expert rater and one with the trait self-reported). Meta-analysis of the three expert-rater-scored cohorts revealed six associated loci harboring numerous candidate genes, including EDAR, SP5, MRPS22, ADGRG6 (GPR126), KIAA1217, and PAX9. The large self-reported 23andMe cohort recapitulated each of these six loci. Moreover, meta-analysis across all four cohorts revealed a total of 49 significant (p < 5 × 10−8) loci. Annotation and enrichment analyses of these 49 loci showed strong evidence of genes involved in ear development and syndromes with auricular phenotypes. RNA sequencing data from both human fetal ear and mouse second branchial arch tissue confirmed that genes located among associated loci showed evidence of expression. These results provide strong evidence for the polygenic nature of earlobe attachment and offer insights into the biological basis of normal and abnormal ear development. Fil: Shaffer, John R.. University of Pittsburgh; Estados Unidos Fil: Li, Jinxi. University of Chinese Academy of Sciences; China Fil: Lee, Myoung Keun. University of Pittsburgh; Estados Unidos Fil: Roosenboom, Jasmien. University of Pittsburgh; Estados Unidos Fil: Orlova, Ekaterina. University of Pittsburgh; Estados Unidos Fil: Adhikari, Kaustabh. Colegio Universitario de Londres; Reino Unido Fil: Gallo, Carla. Universidad Peruana Cayetano Heredia; Perú Fil: Poletti, Giovanni. Universidad Peruana Cayetano Heredia; Perú Fil: Schuler Faccini, Lavinia. Universidade Federal do Rio Grande do Sul; Brasil Fil: Bortolini, Maria Catira. Universidade Federal do Rio Grande do Sul; Brasil Fil: Canizales Quinteros, Samuel. Universidad Nacional Autónoma de México; México Fil: Rothhammer, Francisco. Universidad de Tarapacá; Chile Fil: Bedoya, Gabriel. Universidad de Antioquia; Colombia Fil: González José, Rolando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico. Instituto Patagónico de Ciencias Sociales y Humanas; Argentina Fil: Pfeffer, Paige E.. Saint Louis University; Estados Unidos Fil: Wollenschlaeger, Christopher A.. University of Pittsburgh; Estados Unidos Fil: Hecht, Jacqueline T.. University of Texas; Estados Unidos Fil: Wehby, George. University of Iowa; Estados Unidos Fil: Moreno, Lina M.. University of Iowa; Estados Unidos Fil: Ding, Anan. University of Chinese Academy of Sciences; China Fil: Jin, Li. University of Chinese Academy of Sciences; China. Fudan University; China Fil: Yang, Yajun. Fudan University; China Fil: Carlson, Jenna C.. University of Pittsburgh; Estados Unidos Fil: Leslie, Elizabeth J.. University of Pittsburgh; Estados Unidos Fil: Feingold, Eleanor. University of Pittsburgh; Estados Unidos Fil: Marazita, Mary L.. University of Pittsburgh; Estados Unidos Fil: Hinds, David A.. 899 West Evelyn Avenue; Estados Unidos Fil: Cox, Timothy C.. Seattle Children’s Research Institute; Estados Unidos. University of Washington; Estados Unidos. Monash University; Australia Fil: Wang, Sijia. University of Chinese Academy of Sciences; China. Fudan University; China Fil: Ruiz Linares, Andrés. Colegio Universitario de Londres; Reino Unido. Fudan University; China. Aix-Marseille University; Francia Fil: Weinberg, Seth M.. University of Pittsburgh; Estados Unidos
- Published
- 2017
29. Privacy Risks from Genomic Data-Sharing Beacons
- Author
-
Suyash Shringarpure and Carlos Bustamante
- Subjects
Web server ,animal structures ,Biology ,computer.software_genre ,Computer security ,Genome ,Article ,03 medical and health sciences ,fluids and secretions ,0302 clinical medicine ,Chromosome (genetic algorithm) ,Task Performance and Analysis ,parasitic diseases ,Genetics ,Humans ,Genetics(clinical) ,1000 Genomes Project ,Genetic Privacy ,Genetics (clinical) ,030304 developmental biology ,0303 health sciences ,Genome, Human ,Information Dissemination ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Human genetics ,Beacon ,Personal Genome Project ,Haplotypes ,Human genome ,computer ,030217 neurology & neurosurgery - Abstract
The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries--such as "Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?"--with either "yes" or "no." Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.
- Published
- 2015
30. Replication and characterization of CADM2 and MSRA genes on human behavior
- Author
-
Pierre Fontanillas, Janie F. Shelton, Felix R. Day, John R. B. Perry, Aaron Kleinman, Bethann S. Hromatka, Ken K. Ong, David Hinds, Nadia K. Litterman, Nicholas A. Furlotte, Vladimir Vacic, Catherine H. Wilson, Matthew H. McIntyre, Sarah L. Elson, Suyash Shringarpure, Brian B. Boutwell, Katarzyna Bryc, J. Fah Sathirapongsasuti, David A. Hinds, Jorim Tielbeek, Robert K. Bell, Olga V. Sazonova, Joyce Y. Tung, Joanna L. Mountain, Michelle Agee, Chao Tian, Karen E. Huber, Babak Alipanahi, Adam Auton, Carrie A.M. Northover, Pediatric surgery, Amsterdam Neuroscience - Complex Trait Genetics, and APH - Mental Health
- Subjects
0301 basic medicine ,Candidate gene ,media_common.quotation_subject ,Clinical psychology ,Biology ,Compliance (psychology) ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,Psychology ,Personality ,lcsh:Social sciences (General) ,Big Five personality traits ,lcsh:Science (General) ,media_common ,Genetic association ,Psychiatry ,Multidisciplinary ,Heritability ,030104 developmental biology ,Facet (psychology) ,lcsh:H1-99 ,030217 neurology & neurosurgery ,Neuroscience ,lcsh:Q1-390 ,MSRA - Abstract
Progress identifying the genetic determinants of personality has historically been slow, with candidate gene studies and small-scale genome-wide association studies yielding few reproducible results. In the UK Biobank study, genetic variants in CADM2 and MSRA were recently shown to influence risk taking behavior and irritability respectively, representing some of the first genomic loci to be associated with aspects of personality. We extend this observation by performing a personality “phenome-scan” across 16 traits in up to 140,487 participants from 23andMe for these two genes. Heritability estimates for these traits ranged from 5-19%, with both CADM2 and MSRA demonstrating significant effects on multiple personality types. These associations covered all aspects of the big five personality domains, including specific facet traits such as compliance, altruism, anxiety and activity / energy. This study both confirms and extends the original observations, highlighting the role of genetics in aspects of mental health and behavior.
- Published
- 2017
- Full Text
- View/download PDF
31. Imputation aware tag SNP selection to improve power for multi-ethnic association studies
- Author
-
Carlos Bustamante, Suyash Shringarpure, Daniel Taliun, Christian Fuchsberger, Christopher S. Carlson, Alicia R. Martin, Hyun Min Kang, Christopher R. Gignoux, Ryan P. Welch, Michael Boehnke, Eimear E. Kenny, Gonçalo R. Abecasis, and Genevieve L. Wojcik
- Subjects
0303 health sciences ,education.field_of_study ,Linkage disequilibrium ,Computer science ,Population ,Genomics ,Tag SNP ,computer.software_genre ,Biobank ,03 medical and health sciences ,0302 clinical medicine ,Data mining ,Imputation (statistics) ,1000 Genomes Project ,education ,computer ,030217 neurology & neurosurgery ,Imputation (genetics) ,030304 developmental biology ,Genetic association - Abstract
The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. Consequently, a new generation of genotyping arrays are being developed designed with tag single nucleotide polymorphisms (SNPs) to improve rare variant imputation. Selection of these tag SNPs poses several challenges as rare variants tend to be continentally-or even population-specific and reflect fine-scale linkage disequilibrium (LD) structure impacted by recent demographic events. To explore the landscape of tag-able variation and guide design considerations for large-cohort and biobank arrays, we developed a novel pipeline to select tag SNPs using the 26 population reference panel from Phase of the 1000 Genomes Project. We evaluate our approach using leave-one-out internal validation via standard imputation methods that allows the direct comparison of tag SNP performance by estimating the correlation of the imputed and real genotypes for each iteration of potential array sites. We show how this approach allows for an assessment of array design and performance that can take advantage of the development of deeper and more diverse sequenced reference panels. We quantify the impact of demography on tag SNP performance across populations and provide population-specific guidelines for tag SNP selection. We also examine array design strategies that target single populations versus multi-ethnic cohorts, and demonstrate a boost in performance for the latter can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Finally, we demonstrate the utility of improved array design to provide meaningful improvements in power, particularly in trans-ethnic studies. The unified framework presented will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
- Published
- 2017
32. Population Stratification with Mixed Membership Models.
- Author
-
Suyash Shringarpure and Eric P. Xing
- Published
- 2014
- Full Text
- View/download PDF
33. Comparing multi- and single-sample variant calls to improve variant call sets from deep coverage whole-genome sequencing data
- Author
-
Raul Torres, Francisco M. De La Vega, Zachary A. Szpiech, Ryan D. Hernandez, Rasika A. Mathias, Kathleen C. Barnes, Carlos Bustamante, Margaret A. Taub, Timothy D. O’Connor, and Suyash Shringarpure
- Subjects
Whole genome sequencing ,0303 health sciences ,Massive parallel sequencing ,dbSNP ,02 engineering and technology ,Biology ,computer.software_genre ,Random forest ,03 medical and health sciences ,Data quality ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,False positive rate ,Data mining ,International HapMap Project ,computer ,Classifier (UML) ,030304 developmental biology - Abstract
MotivationVariant calling from next-generation sequencing (NGS) data is susceptible to false positive calls due to sequencing, mapping and other errors. To better distinguish true from false positive calls, we present a method that uses genotype array data from the sequenced samples, rather than public data such as HapMap or dbSNP, to train an accurate classifier using Random Forests. We demonstrate our method on a set of variant calls obtained from 642 African-ancestry genomes from the The Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA), sequenced to high depth (30X).ResultsWe have applied our classifier to compare call sets generated with different calling methods, including both single-sample and multi-sample callers. At a False Positive Rate of 5%, our method determines true positive rates of 97.5%, 95% and 99% on variant calls obtained using Illumina’s single-sample caller CASAVA, Real Time Genomics’ multisample variant caller, and the GATK Unified Genotyper, respectively. Since most NGS sequencing data is accompanied by genotype data for the same samples, our method can be trained on each dataset to provide a more accurate computational validation of site calls compared to generic methods. Moreover, our method allows for adjustment based on allele frequency (e.g., a different set of criteria to determine quality for rare vs. common variants) and thereby provides insight into sequencing characteristics that indicate data quality for variants of different frequencies.AvailabilityCode will be made available prior to publication on Github.
- Published
- 2016
- Full Text
- View/download PDF
34. Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks
- Author
-
David Lloyd, W. Knox Carey, Lucila Ohno-Machado, Suyash Shringarpure, Zhanglong Ji, Jean-Pierre Hubaux, Jean Louis Raisaro, Yongan Zhao, Xiaoqian Jiang, Carlos Bustamante, Diyue Bu, Haixu Tang, Florian Tramèr, Heidi J. Sofia, Shuang Wang, Dixie B. Baker, Paul Flicek, and XiaoFeng Wang
- Subjects
0301 basic medicine ,animal structures ,beacon ,Computer science ,0206 medical engineering ,Inference ,Health Informatics ,Genomics ,02 engineering and technology ,Permission ,Computer security ,computer.software_genre ,Article ,03 medical and health sciences ,fluids and secretions ,Chromosome (genetic algorithm) ,ga4gh ,Data Anonymization ,parasitic diseases ,Humans ,1000 Genomes Project ,Genetic Privacy ,Data anonymization ,Information Dissemination ,genomic data sharing ,genomic privacy ,Beacon ,Data aggregator ,re-identification ,030104 developmental biology ,computer ,020602 bioinformatics - Abstract
The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context—a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or “beacon”) is responsible for assuring that genomic data are exposed through the Beacon service only with the permission of the individual to whom the data pertains and in accordance with the GA4GH policy and standards. While recognizing the inference risks associated with large-scale data aggregation, and the fact that some beacons contain sensitive phenotypic associations that increase privacy risk, the GA4GH adjudged the risk of re-identification based on the binary yes/no allele-presence query responses as acceptable. However, recent work demonstrated that, given a beacon with specific characteristics (including relatively small sample size and an adversary who possesses an individual’s whole genome sequence), the individual’s membership in a beacon can be inferred through repeated queries for variants present in the individual’s genome. In this paper, we propose three practical strategies for reducing re-identification risks in beacons. The first two strategies manipulate the beacon such that the presence of rare alleles is obscured; the third strategy budgets the number of accesses per user for each individual genome. Using a beacon containing data from the 1000 Genomes Project, we demonstrate that the proposed strategies can effectively reduce re-identification risk in beacon-like datasets.
- Published
- 2016
35. Efficient analysis of large datasets and sex bias with ADMIXTURE
- Author
-
Kenneth Lange, David Alexander, Carlos Bustamante, and Suyash Shringarpure
- Subjects
Male ,0301 basic medicine ,Computer science ,Genomic data ,Pedigree chart ,Genomics ,HapMap Project ,Admixture ,Computational biology ,Biology ,computer.software_genre ,Biochemistry ,Set (abstract data type) ,03 medical and health sciences ,0302 clinical medicine ,Gene Frequency ,Structural Biology ,Southwestern United States ,Humans ,International HapMap Project ,1000 Genomes Project ,Molecular Biology ,Allele frequency ,Reference panels ,030304 developmental biology ,0303 health sciences ,Sex bias ,Ancestry inference ,Applied Mathematics ,Supervised learning ,030305 genetics & heredity ,Pedigrees ,Sex-chromosome ,Computer Science Applications ,Black or African American ,Genetics, Population ,030104 developmental biology ,Female ,Data mining ,computer ,Software ,030217 neurology & neurosurgery - Abstract
Background A number of large genomic datasets are being generated for studies of human ancestry and diseases. The ADMIXTURE program is commonly used to infer individual ancestry from genomic data. Results We describe two improvements to the ADMIXTURE software. The first enables ADMIXTURE to infer ancestry for a new set of individuals using cluster allele frequencies from a reference set of individuals. Using data from the 1000 Genomes Project, we show that this allows ADMIXTURE to infer ancestry for 10,920 individuals in a few hours (a 5 × speedup). This mode also allows ADMIXTURE to correctly estimate individual ancestry and allele frequencies from a set of related individuals. The second modification allows ADMIXTURE to correctly handle X-chromosome (and other haploid) data from both males and females. We demonstrate increased power to detect sex-biased admixture in African-American individuals from the 1000 Genomes project using this extension. Conclusions These modifications make ADMIXTURE more efficient and versatile, allowing users to extract more information from large genomic datasets. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1082-x) contains supplementary material, which is available to authorized users.
- Published
- 2016
36. Additional file 1 of Efficient analysis of large datasets and sex bias with ADMIXTURE
- Author
-
Suyash Shringarpure, Bustamante, Carlos, Lange, Kenneth, and Alexander, David
- Abstract
Supplementary Text. (PDF 132 kb)
- Published
- 2016
- Full Text
- View/download PDF
37. mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations
- Author
-
Eric P. Xing and Suyash Shringarpure
- Subjects
Genetics ,Models, Genetic ,Racial Groups ,Structural estimation ,Inheritance (genetic algorithm) ,Computational Biology ,Reproducibility of Results ,Inference ,Investigations ,Biology ,Polymorphism, Single Nucleotide ,Phylogenetics ,Human Genome Project ,Mutation ,Mutation (genetic algorithm) ,Humans ,Microsatellite ,Human genome ,Allele ,Algorithms ,Alleles ,Phylogeny ,Software ,Microsatellite Repeats - Abstract
Traditional methods for analyzing population structure, such as the Structure program, ignore the influence of the effect of allele mutations between the ancestral and current alleles of genetic markers, which can dramatically influence the accuracy of the structural estimation of current populations. Studying these effects can also reveal additional information about population evolution such as the divergence time and migration history of admixed populations. We propose mStruct, an admixture of population-specific mixtures of inheritance models that addresses the task of structure inference and mutation estimation jointly through a hierarchical Bayesian framework, and a variational algorithm for inference. We validated our method on synthetic data and used it to analyze the Human Genome Diversity Project–Centre d'Etude du Polymorphisme Humain (HGDP–CEPH) cell line panel of microsatellites and HGDP single-nucleotide polymorphism (SNP) data. A comparison of the structural maps of world populations estimated by mStruct and Structure is presented, and we also report potentially interesting mutation patterns in world populations estimated by mStruct.
- Published
- 2009
38. Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes
- Author
-
Carlos Bustamante, Francisco M. De La Vega, Andrew Carroll, and Suyash Shringarpure
- Subjects
Data management ,Distributed computing ,Population ,lcsh:Medicine ,Sample (statistics) ,Genomics ,Cloud computing ,Biology ,03 medical and health sciences ,0302 clinical medicine ,Databases, Genetic ,Humans ,1000 Genomes Project ,lcsh:Science ,education ,030304 developmental biology ,Genetics ,0303 health sciences ,education.field_of_study ,Multidisciplinary ,business.industry ,Genome, Human ,lcsh:R ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Cloud Computing ,Pipeline (software) ,030220 oncology & carcinogenesis ,Scalability ,lcsh:Q ,business ,Software ,Research Article - Abstract
Population scale sequencing of whole human genomes is becoming economically feasible; however, data management and analysis remains a formidable challenge for many research groups. Large sequencing studies, like the 1000 Genomes Project, have improved our understanding of human demography and the effect of rare genetic variation in disease. Variant calling on datasets of hundreds or thousands of genomes is time-consuming, expensive, and not easily reproducible given the myriad components of a variant calling pipeline. Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller. We deployed the population caller on the Amazon cloud with the DNAnexus platform in order to achieve low-cost variant calling. Using our pipeline, we were able to identify 68.3 million variants in 2,535 samples from Phase 3 of the 1000 Genomes Project. By performing the variant calling in a parallel manner, the data was processed within 5 days at a compute cost of $7.33 per sample (a total cost of $18,590 for completed jobs and $21,805 for all jobs). Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies in the future.
- Published
- 2015
39. Mixed Membership Classification for Documents with Hierarchically Structured Labels
- Author
-
Edoardo M. Airoldi, David M. Blei, Suyash Shringarpure, Elena A. Erosheva, Stephen E. Fienberg, and Eric P. Xing
- Subjects
Biology ,Population stratification ,Demography - Published
- 2014
40. Effects of sample selection bias on the accuracy of population structure and ancestry inference
- Author
-
Eric P. Xing and Suyash Shringarpure
- Subjects
Genotype ,media_common.quotation_subject ,Population ,Inference ,Datasets as Topic ,Sample (statistics) ,Biology ,Investigations ,Population stratification ,Bioinformatics ,admixture analysis ,Risk Factors ,biased sampling ,Statistics ,Genetics ,Animals ,education ,Molecular Biology ,Genetics (clinical) ,Selection Bias ,Sampling bias ,media_common ,Selection bias ,education.field_of_study ,Small number ,population structure ,Models, Theoretical ,global ancestry ,Genetics, Population ,Sample size determination ,Cattle ,Algorithms ,Genome-Wide Association Study - Abstract
Population stratification is an important task in genetic analyses. It provides information about the ancestry of individuals and can be an important confounder in genome-wide association studies. Public genotyping projects have made a large number of datasets available for study. However, practical constraints dictate that of a geographical/ethnic population, only a small number of individuals are genotyped. The resulting data are a sample from the entire population. If the distribution of sample sizes is not representative of the populations being sampled, the accuracy of population stratification analyses of the data could be affected. We attempt to understand the effect of biased sampling on the accuracy of population structure analysis and individual ancestry recovery. We examined two commonly used methods for analyses of such datasets, ADMIXTURE and EIGENSOFT, and found that the accuracy of recovery of population structure is affected to a large extent by the sample used for analysis and how representative it is of the underlying populations. Using simulated data and real genotype data from cattle, we show that sample selection bias can affect the results of population structure analyses. We develop a mathematical framework for sample selection bias in models for population structure and also proposed a correction for sample selection bias using auxiliary information about the sample. We demonstrate that such a correction is effective in practice using simulated and real data.
- Published
- 2014
41. The Great Migration and African-American Genomic Diversity
- Author
-
Carlos Bustamante, Jacob Errington, Suyash Shringarpure, Melinda C. Aldrich, Scott M. Williams, Eimear E. Kenny, William Blot, Maxime Barakatt, Soheil Baharian, Simon Gravel, and Christopher R. Gignoux
- Subjects
0301 basic medicine ,Cancer Research ,Genetic traits ,Distribution (economics) ,Population genetics ,Population density ,Mathematical and Statistical Techniques ,0302 clinical medicine ,Gene Frequency ,Native Americans ,Genotype ,Ethnicities ,10. No inequality ,Hispanic People ,Genetics (clinical) ,media_common ,African Americans ,African american ,0303 health sciences ,education.field_of_study ,Geography ,Mathematical Models ,Human migration ,Genomics ,Population groupings ,Census ,Europe ,Research Design ,Paleogeography ,030220 oncology & carcinogenesis ,Research Article ,lcsh:QH426-470 ,Human Migration ,media_common.quotation_subject ,Population ,Black People ,Biology ,Research and Analysis Methods ,Polymorphism, Single Nucleotide ,03 medical and health sciences ,Population Metrics ,Genetics ,Humans ,education ,Molecular Biology ,Socioeconomic status ,Ecology, Evolution, Behavior and Systematics ,Demography ,030304 developmental biology ,Population Density ,Evolutionary Biology ,Survey Research ,Population Biology ,business.industry ,Biology and Life Sciences ,Paleontology ,United States ,Black or African American ,lcsh:Genetics ,Genetics, Population ,030104 developmental biology ,Evolutionary biology ,Random Walk ,Earth Sciences ,People and places ,business ,Population Genetics ,030217 neurology & neurosurgery ,Regional differences ,Diversity (politics) - Abstract
We present a comprehensive assessment of genomic diversity in the African-American population by studying three genotyped cohorts comprising 3,726 African-Americans from across the United States that provide a representative description of the population across all US states and socioeconomic status. An estimated 82.1% of ancestors to African-Americans lived in Africa prior to the advent of transatlantic travel, 16.7% in Europe, and 1.2% in the Americas, with increased African ancestry in the southern United States compared to the North and West. Combining demographic models of ancestry and those of relatedness suggests that admixture occurred predominantly in the South prior to the Civil War and that ancestry-biased migration is responsible for regional differences in ancestry. We find that recent migrations also caused a strong increase in genetic relatedness among geographically distant African-Americans. Long-range relatedness among African-Americans and between African-Americans and European-Americans thus track north- and west-bound migration routes followed during the Great Migration of the twentieth century. By contrast, short-range relatedness patterns suggest comparable mobility of ∼15–16km per generation for African-Americans and European-Americans, as estimated using a novel analytical model of isolation-by-distance., Author Summary Genetic studies of African-Americans identify functional variants, elucidate historical and genealogical mysteries, and reveal basic biology. However, African-Americans have been under-represented in genetic studies, and relatively little is known about nation-wide patterns of genomic diversity in the population. Here, we study African-American genomic diversity using genotype data from nationally and regionally representative cohorts. Access to these unique cohorts allows us to clarify the role of population structure, admixture, and recent massive migrations in shaping African-American genomic diversity and sheds new light on the genetic history of this population.
- Published
- 2016
42. Statistical Methods for studying Genetic Variation in Populations
- Author
-
Suyash Shringarpure
- Subjects
FOS: Psychology ,170203 Knowledge Representation and Machine Learning - Abstract
The study of genetic variation in populations is of great interest for the study of the evolutionary history of humans and other species. Improvement in sequencing technology has resulted in the availability of many large datasets of genetic data. Computational methods have therefore become quite important in analyzing these data. Two important problems that have been studied using genetic data are population stratification (modeling individual ancestry with respect to ancestral populations) and genetic association (finding genetic polymorphisms that affect a trait). In this thesis, we develop methods to improve our understanding of these two problems. For the population stratification problem, we develop hierarchical Bayesian models that incorporate the evolutionary processes that are known to affect genetic variation. By developing mStruct, we show that modeling more evolutionary processes improves the accuracy of the recovered population structure. We demonstrate how nonparametric Bayesian processes can be used to address the question of choosing the optimal number of ancestral populations that describe the genetic diversity of a given sample of individuals. We also examine how sampling bias in genotyping study design can affect results of population structure analysis and propose a probabilistic framework for modeling and correcting sample selection bias. Genome-wide association studies (GWAS) have vastly improved our understanding of many diseases. However, such studies have failed to uncover much of the variation responsible for a number of common multi-factorial diseases and complex traits. We show how artificial selection experiments on model organisms can be used to better understand the nature of genetic associations. We demonstrate using simulations that using data from artificial selection experiments improves the performance of conventional methods of performing association. We also validate our approach using semi-simulated data from an artificial selection experiment on Drosophila Melanogaster.
- Published
- 2012
- Full Text
- View/download PDF
43. StructHDP: automatic inference of number of clusters and population structure from admixed genotype data
- Author
-
Eric P. Xing, Daegun Won, and Suyash Shringarpure
- Subjects
Statistics and Probability ,Hierarchical Dirichlet process ,Genotype ,Population ,Inference ,Sample (statistics) ,Biology ,01 natural sciences ,Biochemistry ,Coalescent theory ,Songbirds ,010104 statistics & probability ,03 medical and health sciences ,symbols.namesake ,Population Groups ,Statistics ,Cluster (physics) ,Animals ,Cluster Analysis ,Humans ,Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria ,0101 mathematics ,170203 Knowledge Representation and Machine Learning ,education ,Cluster analysis ,Molecular Biology ,030304 developmental biology ,Genetics ,0303 health sciences ,education.field_of_study ,Models, Genetic ,Original Papers ,Computer Science Applications ,FOS: Psychology ,Computational Mathematics ,Population Genomics ,Genetics, Population ,Computational Theory and Mathematics ,symbols ,Gibbs sampling - Abstract
Motivation: Clustering of genotype data is an important way of understanding similarities and differences between populations. A summary of populations through clustering allows us to make inferences about the evolutionary history of the populations. Many methods have been proposed to perform clustering on multilocus genotype data. However, most of these methods do not directly address the question of how many clusters the data should be divided into and leave that choice to the user. Methods: We present StructHDP, which is a method for automatically inferring the number of clusters from genotype data in the presence of admixture. Our method is an extension of two existing methods, Structure and Structurama. Using a Hierarchical Dirichlet Process (HDP), we model the presence of admixture of an unknown number of ancestral populations in a given sample of genotype data. We use a Gibbs sampler to perform inference on the resulting model and infer the ancestry proportions and the number of clusters that best explain the data. Results: To demonstrate our method, we simulated data from an island model using the neutral coalescent. Comparing the results of StructHDP with Structurama shows the utility of combining HDPs with the Structure model. We used StructHDP to analyze a dataset of 155 Taita thrush, Turdus helleri, which has been previously analyzed using Structure and Structurama. StructHDP correctly picks the optimal number of populations to cluster the data. The clustering based on the inferred ancestry proportions also agrees with that inferred using Structure for the optimal number of populations. We also analyzed data from 1048 individuals from the Human Genome Diversity project from 53 world populations. We found that the clusters obtained correspond with major geographical divisions of the world, which is in agreement with previous analyses of the dataset. Availability: StructHDP is written in C++. The code will be available for download at http://www.sailing.cs.cmu.edu/structhdp. Contact: suyash@cs.cmu.edu; epxing@cs.cmu.edu
- Published
- 2011
- Full Text
- View/download PDF
44. Reconceptualizing the classification of PNAS articles
- Author
-
Tanzy Love, Edoardo M. Airoldi, Suyash Shringarpure, Cyrille Joutard, Elena A. Erosheva, Stephen E. Fienberg, Harvard University Statistics Department, Harvard University [Cambridge], Department of Statistics, University of Washington [Seattle], Statistics Department, Carnegie Mellon University, Carnegie Mellon University [Pittsburgh] (CMU), Institut Montpelliérain Alexander Grothendieck (IMAG), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Université Paul-Valéry - Montpellier 3 (UPVM), University of Rochester Department of Biostatistics and Computational Biology, University of Rochester [USA], Computer Science Department - Carnegie Mellon University, University of Pittsburgh (PITT), and Pennsylvania Commonwealth System of Higher Education (PCSHE)-Pennsylvania Commonwealth System of Higher Education (PCSHE)
- Subjects
Structure (mathematical logic) ,Cognitive science ,0303 health sciences ,[STAT.AP]Statistics [stat]/Applications [stat.AP] ,Multidisciplinary ,Hierarchical modeling ,Publications ,Statistics as Topic ,National Academy of Sciences, U.S ,Classification ,01 natural sciences ,United States ,010104 statistics & probability ,03 medical and health sciences ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Physical Sciences ,Methods ,0101 mathematics ,Periodicals as Topic ,Psychology ,Biological sciences ,Sensitivity analyses ,Discipline ,ComputingMilieux_MISCELLANEOUS ,030304 developmental biology - Abstract
PNAS article classification is rooted in long-standing disciplinary divisions that do not necessarily reflect the structure of modern scientific research. We reevaluate that structure using latent pattern models from statistical machine learning, also known as mixed-membership models, that identify semantic structure in co-occurrence of words in the abstracts and references. Our findings suggest that the latent dimensionality of patterns underlying PNAS research articles in the Biological Sciences is only slightly larger than the number of categories currently in use, but it differs substantially in the content of the categories. Further, the number of articles that are listed under multiple categories is only a small fraction of what it should be. These findings together with the sensitivity analyses suggest ways to reconceptualize the organization of papers published in PNAS.
- Published
- 2010
45. CSMET: comparative genomic motif detection via multi-resolution phylogenetic shadowing
- Author
-
Mladen Kolar, Eric P. Xing, Suyash Shringarpure, and Pradipta R. Ray
- Subjects
Genome evolution ,Amino Acid Motifs ,Sequence alignment ,Computational biology ,Computational Biology/Comparative Sequence Analysis ,Biology ,Evolution, Molecular ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,0302 clinical medicine ,Phylogenetics ,Genetics ,Computer Simulation ,Graphical model ,Evolutionary dynamics ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Phylogeny ,030304 developmental biology ,0303 health sciences ,Multiple sequence alignment ,Binding Sites ,Ecology ,Phylogenetic tree ,Models, Genetic ,Chromosome Mapping ,Genetic Variation ,Sequence Analysis, DNA ,Computational Biology/Evolutionary Modeling ,DNA binding site ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Computational Biology/Sequence Motif Analysis ,030217 neurology & neurosurgery ,Algorithms ,Software ,Research Article ,Protein Binding ,Transcription Factors - Abstract
Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution. Conventional probabilistic phylogenetic shadowing methods model the evolution of genomes only at nucleotide level, and lack the ability to capture the evolutionary dynamics of functional turnover of aligned sequence entities. As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles. We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a context-dependent probabilistic graphical model that allows aligned sites from different taxa in a multiple alignment to be modeled by either a background or an appropriate motif phylogeny conditioning on the functional specifications of each taxon. The functional specifications themselves are the output of a phylogeny which models the evolution not of individual nucleotides, but of the overall functionality (e.g., functional retention or loss) of the aligned sequence segments over lineages. Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders., Author Summary Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution, and play a major role in shaping the genome and regulatory circuitry of contemporary species. Conventional methods for searching non-conserved motifs across evolutionarily related species have little or no probabilistic machinery to explicitly model this important evolutionary process; therefore, they offer little insight into the mechanism and dynamics of TFBS turnover and have limited power in finding motif patterns shaped by such processes. In this paper, we propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a mathematically elegant and computationally efficient way to model biological sequence evolution at both nucleotide level at each individual site, and functional level of a whole TFBS. CSMET offers the first principled way to take into consideration lineage-specific evolution of TFBSs and CRMs during motif detection, and offers a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. Its performance improves upon current state-of-the-art programs. It represents an initial foray into the problem of statistical inference of functional evolution of TFBS, and offers a well-founded mathematical basis for the development of more realistic and informative models.
- Published
- 2008
46. mStruct
- Author
-
Eric P. Xing and Suyash Shringarpure
- Subjects
Computer science ,Population structure ,Mutation (genetic algorithm) ,Inheritance (genetic algorithm) ,Inference ,Microsatellite ,Computational biology ,Allele ,Bioinformatics ,Synthetic data - Abstract
Traditional methods for analyzing population structure, such as the Structure program, ignore the influence of mutational effects. We propose mStruct, an admixture of population-specific mixtures of inheritance models, that addresses the task of structure inference and mutation estimation jointly through a hierarchical Bayesian framework, and a variational algorithm for inference. We validated our method on synthetic data, and used it to analyze the HGDP-CEPH cell line panel of microsatellites used in (Rosenberg et al., 2002) and the HGDP SNP data used in (Conrad et al., 2006). A comparison of the structural maps of world populations estimated by mStruct and Structure is presented, and we also report potentially interesting mutation patterns in world populations estimated by mStruct, which is not possible by Structure.
- Published
- 2008
47. Germline determinants of the somatic mutation landscape in 2,642 cancer genomes
- Author
-
Bin Zhu, Adrian Baez-Ortega, Ekta Khurana, Anthony DiBiase, Lara Urban, Icgc, Alicia L. Bruzos, Alvis Brazma, Gunnar Rätsch, Suyash Shringarpure, Ludmil B. Alexandrov, Steven Newhouse, Kuan-lin Huang, Mark Gerstein, Serap Erkek, Hana Susak, David C. Wedge, Mattia Bosio, Li Ding, Michael D. McLellan, Tomas Marques-Bonet, Reiner Siebert, Mark H. Wright, Marta Tojo, Leszek J. Klimczak, Geòrgia Escaramís, Jose M. C. Tubio, Nina Habermann, Jorge Zamora, Francisco M. De La Vega, Francesc Muyas, Hidewaki Nakagawa, Tal Shmaya, Peter J. Campbell, Lisa Mirabello, Steven A. Roberts, Jared T. Simpson, Bernardo Rodriguez-Martin, Tobias Rausch, Ying Wu, Dmitry A. Gordenin, Stephen J. Chanock, Olivier Harismendy, Carlos Bustamante, Natalie Saini, Roland F. Schwarz, Atul J. Butte, Tomas Tanskanen, Sergei Yakneen, Eva G. Alvarez, Andy Cafferkey, Xing Hua, Matthias Schlesner, Olivier Delaneau, Yang Li, Oliver Drechsel, Sebastian M. Waszak, Xavier Estivill, Sushant Kumar, L. J. Dursi, Jose Maria Heredia-Genestar, Kai Ye, German Demidov, Shuto Hayashi, Seiya Imoto, Ayellet V. Segrè, Pramod Sharma, Aliaksei Holik, Claudia Calabrese, Aparna Prasad, Grace Tiao, Matthew A. Bailey, Ivo Buchhalter, Venkata Yellapantula, Jan O. Korbel, Roelof Koster, Stephan Ossowski, Gad Getz, R. Jay Mashl, Douglas F. Easton, Lei Song, Arcadi Navarro, Joachim Weischenfeldt, Raquel Rabionet, Nilanjan Chatterjee, Erik Garrison, Nikos Sidiropoulos, Ivica Letunic, Jieming Chen, Oliver Stegle, Lauri A. Aaltonen, and Esa Pitkänen
- Subjects
Genetics ,APOBEC ,0303 health sciences ,Cancer Research ,Somatic cell ,Mutagenesis (molecular biology technique) ,Biology ,medicine.disease_cause ,Genome ,Germline ,3. Good health ,Structural variation ,03 medical and health sciences ,0302 clinical medicine ,Germline mutation ,030220 oncology & carcinogenesis ,medicine ,Carcinogenesis ,030304 developmental biology - Abstract
Cancers develop through somatic mutagenesis, however germline genetic variation can markedly contribute to tumorigenesis via diverse mechanisms. We discovered and phased 88 million germline single nucleotide variants, short insertions/deletions, and large structural variants in whole genomes from 2,642 cancer patients, and employed this genomic resource to study genetic determinants of somatic mutagenesis across 39 cancer types. Our analyses implicate damaging germline variants in a variety of cancer predisposition and DNA damage response genes with specific somatic mutation patterns. Mutations in the MBD4 DNA glycosylase gene showed association with elevated C>T mutagenesis at CpG dinucleotides, a ubiquitous mutational process acting across tissues. Analysis of somatic structural variation exposed complex rearrangement patterns, involving cycles of templated insertions and tandem duplications, in BRCA1-deficient tumours. Genome-wide association analysis implicated common genetic variation at the APOBEC3 gene cluster with reduced basal levels of somatic mutagenesis attributable to APOBEC cytidine deaminases across cancer types. We further inferred over a hundred polymorphic L1/LINE elements with somatic retrotransposition activity in cancer. Our study highlights the major impact of rare and common germline variants on mutational landscapes in cancer.
- Full Text
- View/download PDF
48. Social and non-social autism symptoms and trait domains are genetically dissociable
- Author
-
Freddy Cliquet, Varun Warrier, Roberto Toro, Hyejung Won, Richard Delorme, Thomas Bourgeron, Anders D. Børglum, Janita Bralten, Claire S. Leblond, Geert Poelmans, Jakob Grove, Ward De Witte, Bhismadev Chakrabarti, David A. Hinds, Simon Baron-Cohen, Toro, Roberto [0000-0002-6671-858X], Cliquet, Freddy [0000-0002-9989-0685], Chakrabarti, Bhismadev [0000-0002-6649-7895], Børglum, Anders D. [0000-0001-8627-7219], Grove, Jakob [0000-0003-2284-5744], Hinds, David A. [0000-0002-4911-803X], Apollo - University of Cambridge Repository, University of Cambridge [UK] (CAM), Génétique humaine et fonctions cognitives - Human Genetics and Cognitive Functions (GHFC (UMR_3571 / U-Pasteur_1)), Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Cité (UPCité), University of North Carolina [Chapel Hill] (UNC), University of North Carolina System (UNC), Child and Adolescent Psychiatry Department [AP- HP Hôpital Robert Debré], AP-HP Hôpital universitaire Robert-Debré [Paris], Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP), Radboud University Medical Center [Nijmegen], Radboud University [Nijmegen], University of Reading (UOR), The Lundbeck Foundation Initiative for Integrative Psychiatric Research (iPSYCH), Aarhus University [Aarhus], 23andMe Inc., V.W. was funded by St. John’s College, Cambridge, and the Cambridge Commonwealth Trust. This study was funded by grants to SBC from the Medical Research Council, the Wellcome Trust, the Autism Research Trust, the Templeton World Charity Foundation, and to T.B. from the Institut Pasteur, the CNRS, the INSERM, The Fondamental Foundation, the APHP, the BioPsy Labex and the University Paris Diderot. The research was conducted in association with the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care East of England at Cambridgeshire and Peterborough NHS Foundation Trust. We also received support from the NIHR Cambridge Biomedical Research Centre. We acknowledge with gratitude the generous support of Drs Dennis and Mireille Gillings in strengthening the collaboration between S.B.-C. and T.B., and between Cambridge University and the Institut Pasteur. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. Data obtained from 23andMe was supported by the National Human Genome Research Institute of the National Institutes of Health (grant number R44HG006981). The UK Medical Research Council and Wellcome (grant ref: 102215/2/13/2) and the University of Bristol provide core support for ALSPAC. GWAS data were generated by Sample Logistics and Genotyping Facilities at Wellcome Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. This publication is the work of the authors who will serve as guarantors for the content of this paper. The iPSYCH (The Lundbeck Foundation Initiative for Integrative Psychiatric Research) team acknowledges funding from The Lundbeck Foundation (grant nos. R102-A9118 and R155-2014-1724), the Stanley Medical Research Institute, the European Research Council (project no.: 294838), the Novo Nordisk Foundation for supporting the Danish National Biobank resource, and grants from Aarhus and Copenhagen Universities and University Hospitals, including support to the iSEQ Center, the GenomeDK HPC facility, and the CIRRAU Center. The project leading to this application has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement no. 777394. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA and AUTISM SPEAKS, Autistica, SFARI. We thank the iPSCH-Broad Autism Group and the EU-AIMS LEAP group for sharing data. A full list of the authors and affiliations in the iPSYCH-Broad autism group and the EU-AIMS LEAP group is provided in the Supplementary Information., The 23andMe Research Team : Michelle Agee, Babak Alipanahi, Adam Auton, Robert K. Bell, Katarzyna Bryc, Sarah L. Elson, Pierre Fontanillas, Nicholas A. Furlotte, Karen E. Huber, Aaron Kleinman, Nadia K. Litterman, Jennifer C. McCreight, Matthew H. McIntyre, Joanna L. Mountain, Carrie A.M. Northover, Steven J. Pitts, J. Fah Sathirapongsasuti, Olga V. Sazonova, Janie F. Shelton, Suyash Shringarpure, Chao Tian, Joyce Y. Tung, Vladimir Vacic & Catherine H. Wilson, European Project: 294838,EC:FP7:ERC,ERC-2011-ADG_20110310,EIMS(2012), Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP), and Radboud university [Nijmegen]
- Subjects
Male ,DIAGNOSTIC OBSERVATION SCHEDULE ,LD SCORE REGRESSION ,45/43 ,Medicine (miscellaneous) ,Genome-wide association study ,Genome-wide association studies ,Developmental psychology ,FUNCTIONING AUTISM ,Cohort Studies ,0302 clinical medicine ,Heritability of autism ,MESH: Cohort Studies ,lcsh:QH301-705.5 ,GENERAL-POPULATION ,631/208/366/1373 ,0303 health sciences ,MESH: Genetic Predisposition to Disease ,article ,Autism spectrum disorders ,MESH: Reproducibility of Results ,Phenotype ,Autistic traits ,631/208/727/2000 ,Trait ,Female ,General Agricultural and Biological Sciences ,Psychology ,MESH: Social Behavior ,EMPATHY QUOTIENT ,MESH: Autistic Disorder ,631/208/205/2138 ,Genetic predisposition to disease ,MESH: Phenotype ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,PSYCHOMETRIC ANALYSIS ,All institutes and research themes of the Radboud University Medical Center ,[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN] ,mental disorders ,medicine ,Humans ,REPETITIVE BEHAVIOR ,Autistic Disorder ,GENOME-WIDE ASSOCIATION ,Social Behavior ,Association (psychology) ,030304 developmental biology ,SYSTEMATIZING QUOTIENT ,Neurodevelopmental disorders Donders Center for Medical Neuroscience [Radboudumc 7] ,MESH: Humans ,Reproducibility of Results ,SPECTRUM QUOTIENT AQ ,medicine.disease ,MESH: Male ,[SDV.GEN.GH]Life Sciences [q-bio]/Genetics/Human genetics ,lcsh:Biology (General) ,MESH: Genome-Wide Association Study ,Autism ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,MESH: Female ,030217 neurology & neurosurgery ,Genome-Wide Association Study - Abstract
The core diagnostic criteria for autism comprise two symptom domains – social and communication difficulties, and unusually repetitive and restricted behaviour, interests and activities. There is some evidence to suggest that these two domains are dissociable, though this hypothesis has not yet been tested using molecular genetics. We test this using a genome-wide association study (N = 51,564) of a non-social trait related to autism, systemising, defined as the drive to analyse and build systems. We demonstrate that systemising is heritable and genetically correlated with autism. In contrast, we do not identify significant genetic correlations between social autistic traits and systemising. Supporting this, polygenic scores for systemising are significantly and positively associated with restricted and repetitive behaviour but not with social difficulties in autistic individuals. These findings strongly suggest that the two core domains of autism are genetically dissociable, and point at how to fractionate the genetics of autism., Varun Warrier et al. report a genome-wide association study of systemising, a non-social trait associated with autism. They find 3 loci associated with systemising and show that this trait has no significant genetic correlations to social phenotypic measures, demonstrating that the social and non-social aspects of autism are genetically distinct.
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.