69 results on '"Mark A. DePristo"'
Search Results
2. CrowdVariant: a crowdsourcing approach to classify copy number variants.
- Author
-
Peyton Greenside, Justin M. Zook, Marc Salit, Madeleine L. Cule, Ryan Poplin, and Mark A. DePristo
- Published
- 2019
3. GenomeWarp: an alignment-based variant coordinate transformation.
- Author
-
Cory Y. McLean, Yeongwoo Hwang, Ryan Poplin, and Mark A. DePristo
- Published
- 2019
- Full Text
- View/download PDF
4. The variant call format and VCFtools.
- Author
-
Petr Danecek, Adam Auton, Gonçalo R. Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean McVean, and Richard Durbin
- Published
- 2011
- Full Text
- View/download PDF
5. Using deep learning to annotate the protein universe
- Author
-
Maxwell L, Bileschi, David, Belanger, Drew H, Bryant, Theo, Sanderson, Brandon, Carter, D, Sculley, Alex, Bateman, Mark A, DePristo, and Lucy J, Colwell
- Subjects
Proteomics ,Deep Learning ,Proteome ,Humans ,Molecular Sequence Annotation ,Amino Acid Sequence ,Databases, Protein - Abstract
Understanding the relationship between amino acid sequence and protein function is a long-standing challenge with far-reaching scientific and translational implications. State-of-the-art alignment-based techniques cannot predict function for one-third of microbial protein sequences, hampering our ability to exploit data from diverse organisms. Here, we train deep learning models to accurately predict functional annotations for unaligned amino acid sequences across rigorous benchmark assessments built from the 17,929 families of the protein families database Pfam. The models infer known patterns of evolutionary substitutions and learn representations that accurately cluster sequences from unseen families. Combining deep models with existing methods significantly improves remote homology detection, suggesting that the deep models learn complementary information. This approach extends the coverage of Pfam by9.5%, exceeding additions made over the last decade, and predicts function for 360 human reference proteome proteins with no previous Pfam annotation. These results suggest that deep learning models will be a core component of future protein annotation tools.
- Published
- 2021
6. Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation.
- Author
-
Jason Flannick, Joshua M. Korn, Pierre Fontanillas, George B. Grant, Eric Banks, Mark A. DePristo, and David Altshuler
- Published
- 2012
- Full Text
- View/download PDF
7. ContEst: estimating cross-contamination of human samples in next-generation sequencing data.
- Author
-
Kristian Cibulskis, Aaron McKenna, Tim Fennell, Eric Banks, Mark A. DePristo, and Gad Getz
- Published
- 2011
- Full Text
- View/download PDF
8. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome
- Author
-
Armin Töpfer, Justin M. Zook, Heng Li, Gregory T. Concepcion, Medhat Mahmoud, Paul Peluso, Andrew Carroll, Aaron M. Wenger, Nathan D. Olson, Alexander Kolesnikov, Michael Alonge, Arkarachai Fungtammasan, Adam M. Phillippy, Michael C. Schatz, David R. Rank, Jue Ruan, Sergey Koren, Fritz J. Sedlazeck, Pi-Chuan Chang, Yufeng Qian, Gene Myers, William J Rowell, Mark A. DePristo, Richard Hall, Tobias Marschall, Chen-Shan Chin, Michael W. Hunkapiller, and Jana Ebler
- Subjects
Computer science ,Biomedical Engineering ,Sequence assembly ,Bioengineering ,Genomics ,Third generation sequencing ,Computational biology ,Applied Microbiology and Biotechnology ,Genome ,DNA sequencing ,Article ,03 medical and health sciences ,0302 clinical medicine ,Humans ,Base sequence ,030304 developmental biology ,0303 health sciences ,Base Sequence ,Genome, Human ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Sequence Analysis, DNA ,Haplotypes ,Molecular Medicine ,Human genome ,DNA, Circular ,030217 neurology & neurosurgery ,Biotechnology - Abstract
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions 15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.
- Published
- 2019
- Full Text
- View/download PDF
9. Using Deep Learning to Annotate the Protein Universe
- Author
-
D. Sculley, Brandon Carter, Maxwell L. Bileschi, Mark A. DePristo, Drew Bryant, Theo Sanderson, Lucy J. Colwell, and David Belanger
- Subjects
chemistry.chemical_classification ,Sequence ,Protein function ,Computer science ,business.industry ,Deep learning ,Biomedical Engineering ,A protein ,Bioengineering ,Computational biology ,Applied Microbiology and Biotechnology ,Amino acid ,Transmembrane domain ,chemistry ,Benchmark (computing) ,Molecular Medicine ,Protein function prediction ,Artificial intelligence ,business ,Peptide sequence ,Biotechnology ,Universe (mathematics) - Abstract
Understanding the relationship between amino acid sequence and protein function is a long-standing problem in molecular biology with far-reaching scientific implications. Despite six decades of progress, state-of-the-art techniques cannot annotate 1/3 of microbial protein sequences, hampering our ability to exploit sequences collected from diverse organisms. In this paper, we explore an alternative methodology based on deep learning that learns the relationship between unaligned amino acid sequences and their functional annotations across all 17929 families of the Pfam database. Using the Pfam seed sequences we establish rigorous benchmark assessments that use both random and clustered data splits to control for potentially confounding sequence similarities between train and test sequences. Using Pfam full, we report convolutional networks that are significantly more accurate and computationally efficient than BLASTp, while learning sequence features such as structural disorder and transmembrane helices. Our model co-locates sequences from unseen families in embedding space, allowing sequences from novel families to be accurately annotated. These results suggest deep learning models will be a core component of future protein function prediction tools.
- Published
- 2019
- Full Text
- View/download PDF
10. Challenges of Accuracy in Germline Clinical Sequencing Data
- Author
-
Ryan Poplin, Justin M. Zook, and Mark A. DePristo
- Subjects
Genome, Human ,business.industry ,Sequencing data ,Genetic Diseases, Inborn ,MEDLINE ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Genomics ,General Medicine ,Computational biology ,Genome ,Article ,Germline ,Germ Cells ,Germline mutation ,Genetic variation ,Humans ,Medicine ,Genetic Testing ,business ,Germ-Line Mutation - Published
- 2021
- Full Text
- View/download PDF
11. The genetic architecture of type 2 diabetes
- Author
-
Clement Ma, Liisa Hakaste, Claes Ladenvall, Massimo Mangino, Lars Lind, Aaron G. Day-Williams, Ryan Poplin, Thomas Schwarzmayr, Annette Peters, Thomas Meitinger, Satish Kumar, Kyong Soo Park, Ching-Yu Cheng, Farook Thameem, Kyle J. Gaulton, Olov Rolandsson, Yossi Farjoun, Neil Robertson, Mauricio O. Carneiro, Mark A. DePristo, Pierre Fontanillas, Ruth J. F. Loos, Heather M. Highland, Vilmantas Giedraitis, Gemma Buck, Fredrik Karpe, Mohammad Kamran Ikram, Alex S. F. Doney, Hae Kyung Im, Eric S. Lander, Johanna Kuusisto, Dorothée Thuillier, Denis Rybin, Jacquelyn Murphy, Benjamin M. Neale, Martina Müller-Nurasyid, Eric R. Gamazon, Johanne Marie Justesen, Tin Aung, Vincent K. L. Lam, John Danesh, Donna M. Lehman, Goo Jun, Marit E. Jørgensen, Christian Herder, Edmund Chan, Pamela J. Hicks, Todd Green, Claudia Langenberg, Yoon Shin Cho, James G. Wilson, Daniel E. Hale, Rector Arya, Giriraj R. Chandak, Juan Fernandez Tajes, Jaakko Tuomilehto, Christopher P. Jenkinson, Anders Rosengren, Torsten Lauritzen, Michael Griswold, Karen L. Mohlke, Amy J. Swift, Alena Stančáková, Christian Fuchsberger, Iksoo Huh, Nikhil Tandon, João Fadista, Christa Meisinger, Gonçalo R. Abecasis, Pål R. Njølstad, Dwaipayan Bharadwaj, Richard M. Watanabe, Shaun Purcell, Veikko Salomaa, Tim D. Spector, Paul W. Franks, Adam E. Locke, Sharon P. Fowler, Panos Deloukas, Reedik Mägi, Solomon K. Musani, Kee Seng Chia, Barbara Thorand, Soo Heon Kwak, Annemari Käräjämäki, Evelin Mihailov, John Blangero, Thomas Wieland, Christian Gieger, E. Shyong Tai, Jianjun Liu, James B. Meigs, Teemu Kuulasmaa, Martin Hrabé de Angelis, Francis S. Collins, Tim M. Strom, Tibor V. Varga, Juyoung Lee, Mark I. McCarthy, Domenico Palli, Jong-Young Lee, Benjamin Glaser, Tasha E. Fingerlin, Stephen O'Rahilly, Andrew T. Hattersley, Gil Atzmon, Manuel A. Rivas, Noël P. Burtt, Ivan Brandslund, Young-Jin Kim, Dorairaj Prabhakaran, Xu Wang, Taylor J. Maxwell, Valeriya Lyssenko, Frank B. Hu, Adolfo Correa, Khalid Shakir, Kathleen A. Jablonski, Yingchang Lu, Wei-Yen Lim, Min Jin Go, William R. Scott, Hanna E. Abboud, Leena Kinnunen, Inês Barroso, Gabriela L. Surdulescu, Peng Chen, Robert Sladek, Marju Orho-Melander, Xueling Sim, Robert A. Scott, Joanna M. M. Howson, Markku Laakso, David Altshuler, Vidya S. Farook, Harald Grallert, Janina S. Ried, Vineeta Agarwala, Philippe Froguel, Nicholas J. Wareham, Tiinamaija Tuomi, Joon Yoon, Heikki A. Koistinen, Toni I. Pollin, Heiner Boeing, Marie Loh, Nicola L. Beer, Joanne E. Curran, Lars Lannfelt, Dorota Pasko, Taesung Park, David Buck, Nir Barzilai, Stephen C. J. Parker, Michael Roden, Bok-Ghee Han, David Aguilar, Timothy Fennell, Sian-Tsung Tan, Gregory P. Wilson, Momoko Horikoshi, Cramer Christensen, Craig L. Hanis, Gilean McVean, Benjamin F. Voight, Wolfgang Rathmann, Jared Maguire, Ashok Kumar, Donald W. Bowden, Michael L. Stitzel, Yongkang Kim, Christine Blancher, Jason Carey, Min-Seok Kwon, Nicholette D. Palmer, Oluf Pedersen, Niels Grarup, Peter S. Chines, Cornelia Huth, James Scott, Torben Hansen, Konstantin Strauch, Erwin P. Bottinger, Andrew D. Morris, Anette P. Gjesing, Erik Ingelsson, Colin N. A. Palmer, Claudia H. T. Tam, Stacey Gabriel, Pablo Cingolani, Beverley Balkau, Michael Boehnke, Peter M. Nilsson, Wing-Yee So, Andrew R. Wood, Lili Milani, Ravindranath Duggirala, Qibin Qi, Mark Walker, Weiping Jia, Mark Seielstad, Narisu Narisu, Tien Yin Wong, Martijn van de Bunt, Anubha Mahajan, Davis J. McCarthy, Ann-Christine Syvänen, Jennifer Kriebel, N. William Rayner, Han Chen, Jinyan Huang, Chiea Chuen Khor, Richard N. Bergman, Shah Ebrahim, Dylan Hodgkiss, Weihua Zhang, Manjinder S. Sandhu, Richard D. Pearson, Juliana C.N. Chan, Rainer Rauramaa, Ralph A. DeFronzo, Andrew Farmer, Yoshihiko Nagai, Allan Linneberg, Herman A. Taylor, Jose C. Florez, Jaspal S. Kooner, Lori L. Bonnycastle, Jeroen R. Huyghe, Leif Groop, Joseph Trakalo, Sobha Puppala, Bong-Jo Kim, Kathleen Stirrups, Kerrin S. Small, Cecilia M. Lindgren, Jennifer E. Below, Heung Man Lee, Inga Prokopenko, Liming Liang, Timothy M. Frayling, Uzma Afzal, Mark J. Daly, Yik Ying Teo, Andrew P. Morris, Phoenix Kwan, Yvonne T. van der Schouw, Cheng Hu, Katharine R. Owen, Alisa K. Manning, Christopher J. Groves, Josée Dupuis, Ryan P. Welch, Loukas Moutsianas, Joshua D. Smith, Tanya M. Teslovich, Robert C. Onofrio, Maggie C.Y. Ng, Peter Donnelly, Tõnu Esko, Matt Neville, Bo Isomaa, Anne U. Jackson, Jette Bork-Jensen, Barry I. Freedman, Yi Chen, Danish Saleheen, Lu Qi, Laura J. Scott, Jasmina Kravic, Paul Elliott, Mette Hollensted, Carmen Navarro, Selyeong Lee, Benjamin Lehne, Thomas Illig, Nancy J. Cox, Jonathan C. Levy, George B. Grant, John C. Chambers, Adam S. Butterworth, Ronald C.W. Ma, Teresa Ferreira, John R. B. Perry, Eleftheria Zeggini, Hyun Min Kang, Loic Yengo, Eric Banks, Jae-Hoon Lee, Anna L. Gloyn, Christopher Hartl, Wei Zhao, Andres Metspalu, Keng-Han Lin, Graeme I. Bell, Jason Flannick, Torben Jørgensen, Thomas W. Blackwell, Heather M. Stringham, Olle Melander, Omri Gottesman, Other departments, British Heart Foundation, Wellcome Trust, Medical Research Council (MRC), Massachusetts Institute of Technology. Department of Biology, Altshuler, David M, Internal Medicine, Perry, John [0000-0001-6483-3771], Butterworth, Adam [0000-0002-6915-9015], Howson, Joanna [0000-0001-7618-0050], Danesh, John [0000-0003-1158-6791], Johnson, Kathleen [0000-0002-6823-3252], O'Rahilly, Stephen [0000-0003-2199-4449], Langenberg, Claudia [0000-0002-5017-7344], Sandhu, Manjinder [0000-0002-2725-142X], Barroso, Ines [0000-0001-5800-4520], Wareham, Nicholas [0000-0003-1422-2993], and Apollo - University of Cambridge Repository
- Subjects
0301 basic medicine ,PROTEIN INTERACTION NETWORKS ,SUSCEPTIBILITY LOCI ,Genotyping Techniques ,General Science & Technology ,DNA Mutational Analysis ,SEQUENCING ASSOCIATION ,030209 endocrinology & metabolism ,Genome-wide association study ,Biology ,Research Support ,N.I.H ,03 medical and health sciences ,0302 clinical medicine ,Research Support, N.I.H., Extramural ,SDG 3 - Good Health and Well-being ,MD Multidisciplinary ,Genetic predisposition ,Journal Article ,Humans ,Exome ,Genetic Predisposition to Disease ,GENOME-WIDE ASSOCIATION ,Non-U.S. Gov't ,Exome sequencing ,Alleles ,Genetic association ,Genetics ,RISK ,Multidisciplinary ,HERITABILITY ,Research Support, Non-U.S. Gov't ,COMPLEX TRAITS ,Extramural ,Genetic Variation ,Genetic architecture ,HUMAN-DISEASE ,Europe ,030104 developmental biology ,Diabetes Mellitus, Type 2 ,RARE VARIANTS ,Sample Size ,LOW-FREQUENCY ,Imputation (genetics) ,Common disease-common variant ,Genome-Wide Association Study - Abstract
The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.
- Published
- 2016
- Full Text
- View/download PDF
12. Highly-accurate long-read sequencing improves variant detection and assembly of a human genome
- Author
-
Gene Myers, Mark A. DePristo, Michael Alonge, Richard Hall, Michael W. Hunkapiller, Fritz J. Sedlazeck, Paul Peluso, Jana Ebler, Aaron M. Wenger, Pi-Chuan Chang, Sergey Koren, Alexander Kolesnikov, Medhat Mahmoud, Justin M. Zook, Yufeng Qian, Arkarachai Fungtammasan, Nathan D. Olson, Michael C. Schatz, David R. Rank, Jue Ruan, Tobias Marschall, Heng Li, William J Rowell, Chen-Shan Chin, Andrew Carroll, Adam M. Phillippy, Armin Töpfer, and Gregory T. Concepcion
- Subjects
0303 health sciences ,Contig ,Computer science ,Haplotype ,Sequence assembly ,Computational biology ,Genome ,DNA sequencing ,03 medical and health sciences ,0302 clinical medicine ,Human genome ,Precision and recall ,Indel ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
The major DNA sequencing technologies in use today produce either highly-accurate short reads or noisy long reads. We developed a protocol based on single-molecule, circular consensus sequencing (CCS) to generate highly-accurate (99.8%) long reads averaging 13.5 kb and applied it to sequence the well-characterized human HG002/NA24385. We optimized existing tools to comprehensively detect variants, achieving precision and recall above 99.91% for SNVs, 95.98% for indels, and 95.99% for structural variants. We estimate that 2,434 discordances are correctable mistakes in the high-quality Genome in a Bottle benchmark. Nearly all (99.64%) variants are phased into haplotypes, which further improves variant detection. De novo assembly produces a highly contiguous and accurate genome with contig N50 above 15 Mb and concordance of 99.998%. CCS reads match short reads for small variant detection, while enabling structural variant detection and de novo assembly at similar contiguity and markedly higher concordance than noisy long reads.
- Published
- 2019
- Full Text
- View/download PDF
13. CrowdVariant: a crowdsourcing approach to classify copy number variants
- Author
-
Marc L. Salit, Justin M. Zook, Mark A. DePristo, Madeleine Cule, Ryan Poplin, and Peyton Greenside
- Subjects
0301 basic medicine ,Data curation ,business.industry ,Computer science ,Genomics ,Computational biology ,Benchmarking ,Biology ,computer.software_genre ,Crowdsourcing ,Genome ,DNA sequencing ,03 medical and health sciences ,030104 developmental biology ,Reference sample ,Genetic variation ,Data mining ,Copy-number variation ,Benchmark data ,business ,Genotyping ,computer - Abstract
Copy number variants (CNVs) are an important type of genetic variation that play a causal role in many diseases. The ability to identify high quality CNVs is of substantial clinical relevance. However, CNVs are notoriously difficult to identify accurately from array-based methods and next-generation sequencing (NGS) data, particularly for small (< 10kbp) CNVs. Manual curation by experts widely remains the gold standard but cannot scale with the pace of sequencing, particularly in fast-growing clinical applications. We present the first proof-of-principle study demonstrating high throughput manual curation of putative CNVs by non-experts. We developed a crowdsourcing framework, called CrowdVariant, that leverages Google's high-throughput crowdsourcing platform to create a high confidence set of deletions for NA24385 (NIST HG002/RM 8391), an Ashkenazim reference sample developed in partnership with the Genome In A Bottle (GIAB) Consortium. We show that non-experts tend to agree both with each other and with experts on putative CNVs. We show that crowdsourced non-expert classifications can be used to accurately assign copy number status to putative CNV calls and identify 1,781 high confidence deletions in a reference sample. Multiple lines of evidence suggest these calls are a substantial improvement over existing CNV callsets and can also be useful in benchmarking and improving CNV calling algorithms. Our crowdsourcing methodology takes the first step toward showing the clinical potential for manual curation of CNVs at scale and can further guide other crowdsourcing genomics applications.
- Published
- 2018
- Full Text
- View/download PDF
14. A guide to deep learning in healthcare
- Author
-
Katherine Chou, Jeffrey Dean, Volodymyr Kuleshov, Alexandre Robicquet, Sebastian Thrun, Mark A. DePristo, Greg S. Corrado, Bharath Ramsundar, Andre Esteva, and Claire Cui
- Subjects
0301 basic medicine ,Diagnostic Imaging ,Computer science ,business.industry ,Deep learning ,MEDLINE ,Context (language use) ,General Medicine ,Data science ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Deep Learning ,Electronic health record ,030220 oncology & carcinogenesis ,Health care ,Key (cryptography) ,Reinforcement learning ,Electronic Health Records ,Humans ,Artificial intelligence ,business ,Delivery of Health Care ,Natural Language Processing - Abstract
Here we present deep-learning techniques for healthcare, centering our discussion on deep learning in computer vision, natural language processing, reinforcement learning, and generalized methods. We describe how these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging, and we describe the application of natural language processing to domains such as electronic health record data. Similarly, reinforcement learning is discussed in the context of robotic-assisted surgery, and generalized deep-learning methods for genomics are reviewed.
- Published
- 2018
15. A deep learning approach to pattern recognition for short DNA sequences
- Author
-
George E. Dahl, Cory Y. McLean, Lizzie Dorfman, David Alexander, Mark A. DePristo, Akosua Busia, Pi-Chuan Chang, Ryan Poplin, and Clara Fannjiang
- Subjects
Sequence ,Protein family ,Computer science ,business.industry ,Deep learning ,Pattern recognition ,Sequence alignment ,Function (mathematics) ,Genome ,DNA sequencing ,Metagenomics ,Pattern recognition (psychology) ,Artificial intelligence ,business ,Ribosomal DNA - Abstract
MotivationInferring properties of biological sequences--such as determining the species-of-origin of a DNA sequence or the function of an amino-acid sequence--is a core task in many bioinformatics applications. These tasks are often solved using string-matching to map query sequences to labeled database sequences or via Hidden Markov Model-like pattern matching. In the current work we describe and assess an deep learning approach which trains a deep neural network (DNN) to predict database-derived labels directly from query sequences.ResultsWe demonstrate this DNN performs at state-of-the-art or above levels on a difficult, practically important problem: predicting species-of-origin from short reads of 16S ribosomal DNA. When trained on 16S sequences of over 13,000 distinct species, our DNN achieves read-level species classification accuracy within 2.0% of perfect memorization of training data, and produces more accurate genus-level assignments for reads from held-out species thank-mer, alignment, and taxonomic binning baselines. Moreover, our models exhibit greater robustness than these existing approaches to increasing noise in the query sequences. Finally, we show that these DNNs perform well on experimental 16S mock community dataset. Overall, our results constitute a first step towards our long-term goal of developing a general-purpose deep learning approach to predicting meaningful labels from short biological sequences.AvailabilityTensorFlow training code is available through GitHub (https://github.com/tensorflow/models/tree/master/research). Data in TensorFlow TFRecord format is available on Google Cloud Storage (gs://brain-genomics-public/research/seq2species/).Contactseq2species-interest@google.comSupplementary informationSupplementary data are available in a separate document.
- Published
- 2018
- Full Text
- View/download PDF
16. Deep learning of genomic variation and regulatory network data
- Author
-
Mark A. DePristo, Pi-Chuan Chang, Christoph Lippert, and Amalio Telenti
- Subjects
0301 basic medicine ,Population ,education ,Genomics ,Computational biology ,Biology ,Genome ,03 medical and health sciences ,Deep Learning ,Genetics ,Humans ,Exome ,Gene Regulatory Networks ,Invited Reviews ,Molecular Biology ,Genetics (clinical) ,education.field_of_study ,business.industry ,Genome, Human ,Deep learning ,High-Throughput Nucleotide Sequencing ,General Medicine ,Sequence Analysis, DNA ,Human genetics ,030104 developmental biology ,Identification (biology) ,Human genome ,Artificial intelligence ,business ,Software - Abstract
The human genome is now investigated through high-throughput functional assays, and through the generation of population genomic data. These advances support the identification of functional genetic variants and the prediction of traits (e.g. deleterious variants and disease). This review summarizes lessons learned from the large-scale analyses of genome and exome data sets, modeling of population data and machine-learning strategies to solve complex genomic sequence regions. The review also portrays the rapid adoption of artificial intelligence/deep neural networks in genomics; in particular, deep learning approaches are well suited to model the complex dependencies in the regulatory landscape of the genome, and to provide predictors for genetic variant calling and interpretation.
- Published
- 2018
17. A universal SNP and small-indel variant caller using deep neural networks
- Author
-
Scott Schwartz, Ryan Poplin, David Alexander, Thomas Colthurst, Alexander Ku, Jojo Dijamco, Dan Newburger, Nam Nguyen, Pegah T Afshar, Pi-Chuan Chang, Mark A. DePristo, Sam S Gross, Cory Y. McLean, and Lizzie Dorfman
- Subjects
0301 basic medicine ,Genotype ,Sequence analysis ,Computer science ,DNA Mutational Analysis ,Biomedical Engineering ,Bioengineering ,Genomics ,Computational biology ,Applied Microbiology and Biotechnology ,Genome ,Convolutional neural network ,Polymorphism, Single Nucleotide ,03 medical and health sciences ,INDEL Mutation ,Animals ,Humans ,Indel ,Exome sequencing ,Mammals ,Artificial neural network ,Genome, Human ,High-Throughput Nucleotide Sequencing ,Sequence Analysis, DNA ,030104 developmental biology ,Molecular Medicine ,Neural Networks, Computer ,Software ,Biotechnology - Abstract
Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data. We further show that DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes, highlighting the benefits of using more automated and generalizable techniques for variant calling.
- Published
- 2017
18. Scaling accurate genetic variant discovery to tens of thousands of samples
- Author
-
Christopher W. Whelan, David E. Kling, Stacey Gabriel, Ryan Poplin, Laura D. Gauthier, Ami Levy-Moonshine, Mauricio O. Carneiro, Mark A. DePristo, Monkol Lek, Timothy Fennell, Chandran S, Neale Bm, Khalid Shakir, Ruano-Rubio, Daniel G. MacArthur, Thibault J, Mark J. Daly, Eric Banks, Van der Auwera Ga, and David Roazen
- Subjects
Scalability ,Genetic variation ,Genetic variants ,Population genetics ,Genomics ,Computational biology ,Data mining ,Biology ,Indel ,computer.software_genre ,Scaling ,Exome ,computer - Abstract
Comprehensive disease gene discovery in both common and rare diseases will require the efficient and accurate detection of all classes of genetic variation across tens to hundreds of thousands of human samples. We describe here a novel assembly-based approach to variant calling, the GATK HaplotypeCaller (HC) and Reference Confidence Model (RCM), that determines genotype likelihoods independently per-sample but performs joint calling across all samples within a project simultaneously. We show by calling over 90,000 samples from the Exome Aggregation Consortium (ExAC) that, in contrast to other algorithms, the HC-RCM scales efficiently to very large sample sizes without loss in accuracy; and that the accuracy of indel variant calling is superior in comparison to other algorithms. More importantly, the HC-RCM produces a fully squared-off matrix of genotypes across all samples at every genomic position being investigated. The HC-RCM is a novel, scalable, assembly-based algorithm with abundant applications for population genetics and clinical studies.
- Published
- 2017
- Full Text
- View/download PDF
19. A framework for the interpretation of de novo mutation in human disease
- Author
-
Eric Boerwinkle, Elise B. Robinson, Shaun Purcell, Stephen Sanders, Joseph D. Buxbaum, Edwin H. Cook, Kaitlin E. Samocha, Dennis P. Wall, Stacey Gabriel, Bernie Devlin, Jack A. Kosmicki, Richard A. Gibbs, Christine Stevens, Mark A. DePristo, Daniel G. MacArthur, Mark J. Daly, Aniko Sabo, James S. Sutcliffe, Benjamin M. Neale, Andrew Kirby, Lauren M. McGrath, Aarno Palotie, Karola Rehnström, Swapan Mallick, Kathryn Roeder, and Gerard D. Schellenberg
- Subjects
Male ,Genetics ,medicine.medical_specialty ,Genetics, Medical ,Genomics ,Disease ,Biology ,Genetic code ,Article ,Human disease ,Child Development Disorders, Pervasive ,Genetic Code ,mental disorders ,Mutation ,Mutation (genetic algorithm) ,medicine ,Humans ,Medical genetics ,Exome ,Female ,Genetic Predisposition to Disease ,Gene - Abstract
Spontaneously arising (de novo) mutations have an important role in medical genetics. For diseases with extensive locus heterogeneity, such as autism spectrum disorders (ASDs), the signal from de novo mutations is distributed across many genes, making it difficult to distinguish disease-relevant mutations from background variation. Here we provide a statistical framework for the analysis of excesses in de novo mutation per gene and gene set by calibrating a model of de novo mutation. We applied this framework to de novo mutations collected from 1,078 ASD family trios, and, whereas we affirmed a significant role for loss-of-function mutations, we found no excess of de novo loss-of-function mutations in cases with IQ above 100, suggesting that the role of de novo mutations in ASDs might reside in fundamental neurodevelopmental processes. We also used our model to identify ∼1,000 genes that are significantly lacking in functional coding variation in non-ASD samples and are enriched for de novo loss-of-function mutations identified in ASD cases.
- Published
- 2014
- Full Text
- View/download PDF
20. A framework for the detection of de novo mutations in family-based sequencing data
- Author
-
Mark A. DePristo, Kiran V. Garimella, Eric Banks, Mark J. Daly, Mircea Cretu-Stancu, Laurent C. Francioli, Paul I.W. de Bakker, Kaitlin E. Samocha, Benjamin M. Neale, Wigard P. Kloosterman, Menachem Fromer, Biological Psychology, APH - Mental Health, APH - Methodology, Amsterdam Neuroscience - Mood, Anxiety, Psychosis, Stress & Sleep, APH - Health Behaviors & Chronic Diseases, APH - Personalized Medicine, Genome of the Netherlands consortium, and including
- Subjects
Adult ,Male ,0301 basic medicine ,Netherlands Twin Register (NTR) ,Mutation rate ,Genotype ,Posterior probability ,Genome-wide association study ,Biology ,Polymorphism, Single Nucleotide ,Genome ,Article ,03 medical and health sciences ,Germline mutation ,Genetics ,Journal Article ,Humans ,Exome ,Genetics(clinical) ,Child ,Indel ,Germ-Line Mutation ,Genetics (clinical) ,Chromosomes, Human, X ,Models, Genetic ,Sequence Analysis, DNA ,Human genetics ,Pedigree ,030104 developmental biology ,Female ,Software ,Genome-Wide Association Study - Abstract
Germline mutation detection from human DNA sequence data is challenging due to the rarity of such events relative to the intrinsic error rates of sequencing technologies and the uneven coverage across the genome. We developed PhaseByTransmission (PBT) to identify de novo single nucleotide variants and short insertions and deletions (indels) from sequence data collected in parent-offspring trios. We compute the joint probability of the data given the genotype likelihoods in the individual family members, the known familial relationships and a prior probability for the mutation rate. Candidate de novo mutations (DNMs) are reported along with their posterior probability, providing a systematic way to prioritize them for validation. Our tool is integrated in the Genome Analysis Toolkit and can be used together with the ReadBackedPhasing module to infer the parental origin of DNMs based on phase-informative reads. Using simulated data, we show that PBT outperforms existing tools, especially in low coverage data and on the X chromosome. We further show that PBT displays high validation rates on empirical parent-offspring sequencing data for whole-exome data from 104 trios and X-chromosome data from 249 parent-offspring families. Finally, we demonstrate an association between father’s age at conception and the number of DNMs in female offspring’s X chromosome, consistent with previous literature reports.European Journal of Human Genetics advance online publication, 23 November 2016; doi:10.1038/ejhg.2016.147.
- Published
- 2017
- Full Text
- View/download PDF
21. A Low-Frequency Inactivating AKT2 Variant Enriched in the Finnish Population Is Associated With Fasting Insulin Levels and Type 2 Diabetes Risk
- Author
-
Sobha Puppala, Inga Prokopenko, Mark Walker, Mark O. Goodarzi, Manuel A. Rivas, Andrew D. Morris, Lars Lannfelt, Donald W. Bowden, Ann-Christine Syvänen, Ryan Poplin, Anubha Mahajan, Olov Rolandsson, Taesung Park, Yossi Farjoun, Mauricio O. Carneiro, Oluf Pedersen, James B. Meigs, Richard M. Watanabe, Joseph Trakalo, Jinyan Huang, Chiea Chuen Khor, Uzma Afzal, Seppo Koskinen, Kathleen Stirrups, Lori L. Bonnycastle, Tibor V. Varga, Ashok Kumar, Tim M. Strom, Andrew A. Brown, Dylan Hodgkiss, Benjamin Glaser, Martin Hrabé de Angelis, Robert A. Scott, Noël P. Burtt, Massimo Mangino, Marit E. Jørgensen, Richard N. Bergman, Andrew T. Hattersley, Anders Rosengren, Olle Melander, Timothy M. Frayling, Gil Atzmon, Ivan Brandslund, Weiping Jia, Edmund Chan, John R. B. Perry, Fredrik Karpe, Torsten Lauritzen, Tien Yin Wong, Shah Ebrahim, Jennifer Wessel, Alex S. F. Doney, Martijn van de Bunt, Hae Kyung Im, Samuli Ripatti, David Buck, Nicholette D. Palmer, Pamela J. Hicks, Inês Barroso, Juliana C.N. Chan, Johanna Kuusisto, George B. Grant, Benjamin M. Neale, Dorairaj Prabhakaran, Kee Seng Chia, Barbara Thorand, David Altshuler, Veikko Salomaa, Vidya S. Farook, Harald Grallert, Rainer Rauramaa, Anna L. Gloyn, Heikki Oksa, Christopher Hartl, Lars Lind, Sharon P. Fowler, Eleftheria Zeggini, Pierre Fontanillas, Katharine R. Owen, Markku Laakso, Nikhil Tandon, Lu Qi, Andres Metspalu, Michael Roden, João Fadista, Herman A. Taylor, Loukas Moutsianas, Bok-Ghee Han, Alisa K. Manning, David Aguilar, Dwaipayan Bharadwaj, Hyun Min Kang, Jaana Lindström, James Scott, Eric Banks, Craig L. Hanis, Stacey Gabriel, Jong-Young Lee, Alena Stančáková, Panos Deloukas, Torben Hansen, Mark Seielstad, Suzanne B.R. Jacobs, Kyle J. Gaulton, Michael Boehnke, John C. Chambers, Martina Müller-Nurasyid, Eric R. Gamazon, Ronald C.W. Ma, Graeme I. Bell, Jason Flannick, Torben Jørgensen, Peter M. Nilsson, Shaun Purcell, Lili Milani, Johanne Marie Justesen, Tin Aung, Timo A. Lakka, Antti Jula, Joshua D. Smith, Jason Carey, Jaspal S. Kooner, Jianjun Liu, Mark A. DePristo, Ravindranath Duggirala, Taru Tukiainen, Gabriela L. Surdulescu, Andrew R. Wood, Heather M. Highland, Marie Loh, Christa Meisinger, Peter S. Chines, Thomas Schwarzmayr, Tanya M. Teslovich, Annette Peters, Robert C. Onofrio, Ying Wu, Gonçalo R. Abecasis, Matti Uusitupa, Yuhui Chen, Thomas W. Blackwell, Jennifer Kriebel, Adam E. Locke, Christopher P. Jenkinson, Erik Ingelsson, Colin N. A. Palmer, Khalid Shakir, Heather M. Stringham, Jacquelyn Murphy, Teemu Kuulasmaa, Jasmina Kravic, Cramer Christensen, Audrey Y. Chu, Nancy J. Cox, Donna M. Lehman, Wei-Yen Lim, Terho Lehtimäki, Heikki A. Koistinen, Toni I. Pollin, Amy J. Swift, Niels Grarup, Anette P. Gjesing, Paul Elliott, Tune H. Pers, Pablo Cingolani, Narisu Narisu, Liming Liang, John Blangero, Jonathan C. Levy, Mark J. Daly, Yik Ying Teo, Richard D. Pearson, Manjinder S. Sandhu, Joanne E. Curran, Dorota Pasko, Peter Donnelly, Nir Barzilai, Andrew P. Morris, Jose C. Florez, Cheng Hu, Jeroen R. Huyghe, Gregory A Wilson, Leif Groop, Laura J. Scott, Mette Hollensted, Benjamin Lehne, Thomas Illig, Thomas Meitinger, Satish Kumar, Ana Viñuela, Barry I. Freedman, Christine Blancher, Nicholas J. Wareham, Tiinamaija Tuomi, Matt J. Neville, Goo Jun, Michael Griswold, Karen L. Mohlke, Konstantin Strauch, Marju Orho-Melander, Xueling Sim, Philippe Froguel, Sian-Tsung Tan, Jobanpreet Sehmi, Annemari Käräjämäki, Thomas Wieland, Christian Gieger, Tasha E. Fingerlin, Stephen O'Rahilly, Tõnu Esko, Bo Isomaa, Anne U. Jackson, Jette Bork-Jensen, Josée Dupuis, Maggie C.Y. Ng, Christopher J. Groves, Todd Green, Clement Ma, Aarno Palotie, Liisa Hakaste, Claes Ladenvall, Ching-Yu Cheng, Gemma Buck, Adolfo Correa, Jessica A. Gasser, Hanna E. Abboud, Farook Thameem, Claudia Langenberg, Yoon Shin Cho, Francis S. Collins, William R. Scott, Leena Kinnunen, Kerrin S. Small, Jacqueline I. Goldstein, Cecilia M. Lindgren, Jennifer E. Below, Jaakko Tuomilehto, Reedik Mägi, Solomon K. Musani, Neil R. Robertson, James G. Wilson, Gilean McVean, Daniel E. Hale, Robert Sladek, Rector Arya, Giriraj R. Chandak, Y. Antero Kesäniemi, Christian Fuchsberger, Evelin Mihailov, Tim D. Spector, Paul W. Franks, E. Shyong Tai, Qibin Qi, N. William Rayner, Johan G. Eriksson, Ralph A. DeFronzo, Andrew Farmer, Allan Linneberg, Olli T. Raitakari, Mark I. McCarthy, Frank B. Hu, Timothy Fennell, Jared Maguire, National Institute for Health Research, Imperial College Healthcare NHS Trust- BRC Funding, Medical Research Council (MRC), Public Health England, Perry, John [0000-0001-6483-3771], Wareham, Nicholas [0000-0003-1422-2993], Langenberg, Claudia [0000-0002-5017-7344], Johnson, Kathleen [0000-0002-6823-3252], O'Rahilly, Stephen [0000-0003-2199-4449], Barroso, Ines [0000-0001-5800-4520], Apollo - University of Cambridge Repository, Other departments, School of Medicine / Biomedicine, and School of Medicine / Clinical Nutrition,School of Medicine / Clinical Medicine
- Subjects
0301 basic medicine ,Gerontology ,Endocrinology, Diabetes and Metabolism ,medicine.medical_treatment ,Type 2 diabetes ,MELLITUS ,0302 clinical medicine ,Gene Frequency ,Odds Ratio ,Glucose homeostasis ,Insulin ,Exome ,Finland ,Genetics ,African Americans ,GDNF FAMILY ,Hispanic or Latino ,Fasting ,11 Medical And Health Sciences ,SECRETION ,Hispanic Americans ,Life Sciences & Biomedicine ,BETA-CELL MASS ,EXPRESSION ,Asian Continental Ancestry Group ,Genotype ,European Continental Ancestry Group ,030209 endocrinology & metabolism ,Single-nucleotide polymorphism ,White People ,03 medical and health sciences ,Endocrinology & Metabolism ,Insulin resistance ,Asian People ,FATTY RAT ,Diabetes mellitus ,Internal Medicine ,medicine ,Journal Article ,Humans ,Genetic Predisposition to Disease ,Allele frequency ,Alleles ,Science & Technology ,business.industry ,ZDF RATS ,medicine.disease ,Black or African American ,030104 developmental biology ,Diabetes Mellitus, Type 2 ,Case-Control Studies ,GLUCOSE-HOMEOSTASIS ,Insulin Resistance ,business ,NEURTURIN ,Proto-Oncogene Proteins c-akt ,NEUROTROPHIC FACTOR - Abstract
To identify novel coding association signals and facilitate characterization of mechanisms influencing glycemic traits and type 2 diabetes risk, we analyzed 109,215 variants derived from exome array genotyping together with an additional 390,225 variants from exome sequence in up to 39,339 normoglycemic individuals from five ancestry groups. We identified a novel association between the coding variant (p.Pro50Thr) in AKT2 and fasting plasma insulin (FI), a gene in which rare fully penetrant mutations are causal for monogenic glycemic disorders. The low-frequency allele is associated with a 12% increase in FI levels. This variant is present at 1.1% frequency in Finns but virtually absent in individuals from other ancestries. Carriers of the FI-increasing allele had increased 2-h insulin values, decreased insulin sensitivity, and increased risk of type 2 diabetes (odds ratio 1.05). In cellular studies, the AKT2-Thr50 protein exhibited a partial loss of function. We extend the allelic spectrum for coding variants in AKT2 associated with disorders of glucose homeostasis and demonstrate bidirectional effects of variants within the pleckstrin homology domain of AKT2., final draft, peerReviewed
- Published
- 2017
- Full Text
- View/download PDF
22. Rare Complete Knockouts in Humans: Population Distribution and Significant Role in Autism Spectrum Disorders
- Author
-
Richard A. Gibbs, Shaun Purcell, Stephen Sanders, Bernie Devlin, Li Liu, Andrew Kirby, Chiao-Feng Lin, Elaine T. Lim, Shannon Gross, Eric Boerwinkle, Huyen Dinh, Jeffrey G. Reid, Jason Flannick, Monkol Lek, Alicia Hawes, Matthew W. State, Li-San Wang, Lora Lewis, Joseph D. Buxbaum, Kathryn Roeder, Edwin H. Cook, Uma Nagaswamy, Mark A. DePristo, Gerard D. Schellenberg, Mark J. Daly, David Altshuler, Irene Newsham, Christine Stevens, Stephan Ripke, Menachem Fromer, Yuanqing Wu, Benjamin M. Neale, Soumya Raychaudhuri, Stacey Gabriel, Donna M. Muzny, Aniko Sabo, Daniel G. MacArthur, James S. Sutcliffe, Otto Valladares, and Douglas M. Ruderfer
- Subjects
Proband ,Genetics ,0303 health sciences ,education.field_of_study ,Neuroscience(all) ,General Neuroscience ,Population ,Pseudoautosomal region ,Biology ,Compound heterozygosity ,medicine.disease ,Article ,03 medical and health sciences ,0302 clinical medicine ,medicine ,Autism ,education ,030217 neurology & neurosurgery ,Gene knockout ,X chromosome ,Exome sequencing ,030304 developmental biology - Abstract
SummaryTo characterize the role of rare complete human knockouts in autism spectrum disorders (ASDs), we identify genes with homozygous or compound heterozygous loss-of-function (LoF) variants (defined as nonsense and essential splice sites) from exome sequencing of 933 cases and 869 controls. We identify a 2-fold increase in complete knockouts of autosomal genes with low rates of LoF variation (≤5% frequency) in cases and estimate a 3% contribution to ASD risk by these events, confirming this observation in an independent set of 563 probands and 4,605 controls. Outside the pseudoautosomal regions on the X chromosome, we similarly observe a significant 1.5-fold increase in rare hemizygous knockouts in males, contributing to another 2% of ASDs in males. Taken together, these results provide compelling evidence that rare autosomal and X chromosome complete gene knockouts are important inherited risk factors for ASD.
- Published
- 2013
- Full Text
- View/download PDF
23. Creating a universal SNP and small indel variant caller with deep neural networks
- Author
-
Ryan Poplin, Cory Y. McLean, Dan Newburger, Thomas Colthurst, Lizzie Dorfman, Scott Schwartz, Nam Nguyen, Mark A. DePristo, Alexander Ku, Jojo Dijamco, Pi-Chuan Chang, David Alexander, Sam S Gross, and Pegah T Afshar
- Subjects
business.industry ,Deep learning ,Genomics ,Statistical model ,Biology ,computer.software_genre ,Convolutional neural network ,Genome ,Set (abstract data type) ,Data mining ,Artificial intelligence ,Instrumentation (computer programming) ,business ,computer ,Exome sequencing - Abstract
Next-generation sequencing (NGS) is a rapidly evolving set of technologies that can be used to determine the sequence of an individual’s genome1 by calling genetic variants present in an individual using billions of short, errorful sequence reads2. Despite more than a decade of effort and thousands of dedicated researchers, the hand-crafted and parameterized statistical models used for variant calling still produce thousands of errors and missed variants in each genome3,4. Here we show that a deep convolutional neural network5 can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships (likelihoods) between images of read pileups around putative variant sites and ground-truth genotype calls. This approach, called DeepVariant, outperforms existing tools, even winning the “highest performance” award for SNPs in a FDA-administered variant calling challenge. The learned model generalizes across genome builds and even to other mammalian species, allowing non-human sequencing projects to benefit from the wealth of human ground truth data. We further show that, unlike existing tools which perform well on only a specific technology, DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, from deep whole genomes from 10X Genomics to Ion Ampliseq exomes. DeepVariant represents a significant step from expert-driven statistical modeling towards more automatic deep learning approaches for developing software to interpret biological instrumentation data.
- Published
- 2016
- Full Text
- View/download PDF
24. Analysis of protein-coding genetic variation in 60,706 humans
- Author
-
Jack A. Kosmicki, Mark A. DePristo, Mark I. McCarthy, Patrick F. Sullivan, Laramie E. Duncan, Ryan Poplin, David Neil Cooper, Mitja I. Kurki, Aarno Palotie, Hong-Hee Won, Dermot P.B. McGovern, John Danesh, Jose C. Florez, Grace Tiao, Anne H. O’Donnell-Luria, Timothy Fennell, Gad Getz, Douglas M. Ruderfer, Joanne Berghout, Mark J. Daly, Monkol Lek, Daniel P. Howrigan, Stacey Gabriel, Daniel P. Birnbaum, Ami Levy Moonshine, Michael Boehnke, Ben Weisburd, Ruth McPherson, Christine Stevens, Dongmei Yu, Sekar Kathiresan, Andrew J. Hill, James G. Wilson, James S. Ware, Hugh Watkins, Benjamin M. Neale, Khalid Shakir, David Altshuler, María Teresa Tusié-Luna, Lorena Orozco, James Zou, Samuel A. Rose, Menachem Fromer, Jeremiah M. Scharf, Daniel G. MacArthur, Namrata Gupta, Pamela Sklar, Eric Vallabh Minikel, Steven A. McCarroll, Jaakko Tuomilehto, Jackie Goldstein, Ming T. Tsuang, Stacey Donnelly, Konrad J. Karczewski, Fengmei Zhao, Stephen J. Glatt, Ron Do, Nicole A. Deflaux, Adam Kiezun, Emma Pierce-Hoffman, Markku Laakso, Beryl B. Cummings, Pradeep Natarajan, Danish Saleheen, Karol Estrada, Peter D. Stenson, Manuel A. Rivas, Diego Ardissino, Kaitlin E. Samocha, Gina M. Peloso, Laura D. Gauthier, Eric Banks, Brett Thomas, Shaun Purcell, Taru Tukiainen, Valentin Ruano-Rubio, Christina M. Hultman, Jason Flannick, Roberto Elosua, Complex Trait Genetics, Amsterdam Neuroscience - Complex Trait Genetics, Institute for Molecular Medicine Finland, Aarno Palotie / Principal Investigator, Jaakko Tuomilehto Research Group, Department of Public Health, Clinicum, Genomics of Neurological and Neuropsychiatric Disorders, Danesh, John [0000-0003-1158-6791], Apollo - University of Cambridge Repository, Wellcome Trust, and The Academy of Medical Sciences
- Subjects
0301 basic medicine ,Proteome ,DNA Mutational Analysis ,Datasets as Topic ,Human genetic variation ,GUIDELINES ,0302 clinical medicine ,Exome Aggregation Consortium ,SEQUENCE VARIANTS ,Coding region ,2.1 Biological and endogenous factors ,Exome ,Aetiology ,MUTATION ,Genetics ,0303 health sciences ,Multidisciplinary ,HUMAN-DISEASE ,NETWORKS ,Multidisciplinary Sciences ,Phenotype ,Mutation (genetic algorithm) ,Science & Technology - Other Topics ,Biotechnology ,General Science & Technology ,Genomics ,Computational biology ,Biology ,DNA sequencing ,03 medical and health sciences ,Rare Diseases ,Clinical Research ,Genetic variation ,Humans ,Genetic Testing ,Gene ,030304 developmental biology ,Science & Technology ,Human Genome ,HUMAN-POPULATION HISTORY ,Genetic Variation ,FRAMEWORK ,R1 ,EVOLUTION ,030104 developmental biology ,DISCOVERY ,Sample Size ,Generic health relevance ,3111 Biomedicine ,Genètica humana -- Variació ,030217 neurology & neurosurgery - Abstract
SummaryLarge-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC). The resulting catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We show that this catalogue can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 72% of which have no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human “knockout” variants in protein-coding genes.
- Published
- 2016
- Full Text
- View/download PDF
25. Patterns and rates of exonic de novo mutations in autism spectrum disorders
- Author
-
Omar Jabado, Menachem Fromer, Chad M. Schafer, Braden E. Boone, Jack R. Wimbish, Benjamin M. Neale, Guiqing Cai, Kathryn Roeder, Gerard D. Schellenberg, Li-San Wang, Christine Stevens, Avi Ma'ayan, Bernie Devlin, Richard A. Gibbs, Zuleyma Peralta, Shawn Levy, Yan Kou, Yi Han, Eric Boerwinkle, Evan T. Geller, Kiran V. Garimella, Emily L. Crawford, Jeffrey G. Reid, Chiao-Feng Lin, Elizabeth J. Rossin, Timothy Fennell, Tuo Zhao, Jared Maguire, Mark A. DePristo, Eric Banks, Mark J. Daly, Irene Newsham, Edwin H. Cook, Paz Polak, Ryan Poplin, Kaitlin E. Samocha, Vladimir Makarov, Khalid Shakir, Han Liu, Yuanqing Wu, Benjamin F. Voight, James S. Sutcliffe, Donna M. Muzny, Shamil R. Sunyaev, Li Liu, Andrew Kirby, Seungtai Yoon, Joseph D. Buxbaum, Uma Nagaswamy, Ruth Dannenfelser, Stacey Gabriel, Aniko Sabo, Jayon Lihm, Lora Lewis, Elaine T. Lim, Jason Flannick, Nicholas G. Campbell, Otto Valladares, Catalina Betancur, Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Harvard Medical School [Boston] (HMS), Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Massachusetts Institute of Technology (MIT), Department of pharmacology and systems therapeutics [Mount Sinai], Icahn School of Medicine at Mount Sinai [New York] (MSSM), Seaver Autism Center for Research and Treatment, Department of Statistics, Carnegie Mellon University, Carnegie Mellon University [Pittsburgh] (CMU), Human Genome Sequencing Center, Baylor College of Medicine, Baylor College of Medicine (BCM), Baylor University-Baylor University, Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, University of Pennsylvania, Department of Psychiatry, Division of Genetics, Brigham and Women's Hospital [Boston], Department of Molecular Physiology & Biophysics and Psychiatry, Vanderbilt University [Nashville]-Centers for Human Genetics Research and Molecular Neuroscience, Biostatistics Department and Computer Science Department, Johns Hopkins University, Johns Hopkins University (JHU), Departments of Psychiatry, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai [New York] (MSSM)-Seaver Autism Center-, The Mindich Child Health & Development Institute, Department of Pharmacology, University of Pennsylvania-Perelman School of Medicine, HudsonAlpha Institute for Biotechnology [Huntsville, AL], Physiopathologie des Maladies du Système Nerveux Central, Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Human Genetics Center, The University of Texas Health Science Center at Houston (UTHealth), Friedman Brain Institute, Institute for Juvenile Research-University of Illinois [Chicago] (UIC), University of Illinois System-University of Illinois System, Department of Psychiatry [Pittsburgh], University of Pittsburgh School of Medicine, Pennsylvania Commonwealth System of Higher Education (PCSHE)-Pennsylvania Commonwealth System of Higher Education (PCSHE), Betancur, Catalina, University of Pennsylvania [Philadelphia], and University of Pennsylvania [Philadelphia]-Perelman School of Medicine
- Subjects
MESH: Mutation ,Nonsense mutation ,MESH: Autistic Disorder ,Epigenetics of autism ,[SDV.GEN] Life Sciences [q-bio]/Genetics ,Biology ,MESH: Phenotype ,MESH: Poisson Distribution ,03 medical and health sciences ,0302 clinical medicine ,mental disorders ,Genetic model ,Missense mutation ,MESH: Models, Genetic ,Copy-number variation ,Exome ,MESH: Protein Interaction Maps ,Exome sequencing ,030304 developmental biology ,Genetics ,MESH: Exome ,[SDV.GEN]Life Sciences [q-bio]/Genetics ,0303 health sciences ,MESH: Humans ,Multidisciplinary ,Point mutation ,MESH: Genetic Predisposition to Disease ,MESH: Transcription Factors ,MESH: Case-Control Studies ,MESH: Family Health ,MESH: Multifactorial Inheritance ,MESH: Exons ,MESH: DNA-Binding Proteins ,030217 neurology & neurosurgery - Abstract
International audience; Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.
- Published
- 2012
- Full Text
- View/download PDF
26. A systematic survey of loss-of-function variants in human protein-coding genes
- Author
-
Klaudia Walter, Yali Xue, Jeffrey C. Barrett, Jennifer Harrow, Catherine E. Snow, Mark Gerstein, Ni Huang, Steven A. McCarroll, Jonathan K. Pritchard, Jeffrey A. Rosenfeld, Zhengdong D. Zhang, Hancheng Zheng, Menachem Fromer, Lukas Habegger, Yingrui Li, Mark A. DePristo, If H. A. Barnes, Bryndis Yngvadottir, James Morris, Alexandra Bignell, David Neil Cooper, Gerton Lunter, Ekta Khurana, Stephen B. Montgomery, Richard A. Gibbs, Donald F. Conrad, Emmanouil T. Dermitzakis, Daniel G. MacArthur, Suzannah Bumpstead, Gary Saunders, Kai Ye, Clara Amid, Marie-Marthe Suner, M. Kay, Joseph K. Pickrell, Adam Frankish, Robert E. Handsaker, Suganthi Balasubramanian, Eric Banks, Toby Hunt, Irene Gallego Romero, Cornelis A. Albers, Chris Tyler-Smith, Qasim Ayub, Denise Carvalho-Silva, Matthew E. Hurles, Min Hu, Luke Jostins, Jun Wang, Mike Jin, and Xinmeng Jasmine Mu
- Subjects
Candidate gene ,Gene Expression ,Biology ,Genome ,Polymorphism, Single Nucleotide ,Article ,Genomic disorders and inherited multi-system disorders DCN MP - Plasticity and memory [IGMD 3] ,Gene Frequency ,Genetic variation ,Humans ,ddc:576.5 ,Disease ,Allele ,Selection, Genetic ,Gene ,Loss function ,Genetics ,Multidisciplinary ,Genome, Human ,Genetic Variation ,Proteins ,Phenotype ,Disease/genetics ,Proteins/genetics ,Human genome - Abstract
Defective Gene Detective Identifying genes that give rise to diseases is one of the major goals of sequencing human genomes. However, putative loss-of-function genes, which are often some of the first identified targets of genome and exome sequencing, have often turned out to be sequencing errors rather than true genetic variants. In order to identify the true scope of loss-of-function genes within the human genome, MacArthur et al. (p. 823 ; see the Perspective by Quintana-Murci ) extensively validated the genomes from the 1000 Genomes Project, as well as an additional European individual, and found that the average person has about 100 true loss-of-function alleles of which approximately 20 have two copies within an individual. Because many known disease-causing genes were identified in “normal” individuals, the process of clinical sequencing needs to reassess how to identify likely causative alleles.
- Published
- 2012
- Full Text
- View/download PDF
27. A framework for variation discovery and genotyping using next-generation DNA sequencing data
- Author
-
Guillermo del Angel, Manuel A. Rivas, Ryan Poplin, Kiran V. Garimella, Matt Hanna, Aaron McKenna, Andrey Sivachenko, David Altshuler, Mark A. DePristo, Christopher Hartl, Kristian Cibulskis, Timothy Fennell, Jared Maguire, Stacey Gabriel, Anthony A. Philippakis, Eric Banks, Andrew Kernytsky, and Mark J. Daly
- Subjects
Genotype ,Population ,Computational biology ,Biology ,Polymorphism, Single Nucleotide ,Genome ,Article ,DNA sequencing ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,Humans ,1000 Genomes Project ,education ,Genotyping ,030304 developmental biology ,0303 health sciences ,education.field_of_study ,Variant Call Format ,Genome, Human ,Genetic Variation ,DNA sequencing theory ,Exons ,Sequence Analysis, DNA ,Genetics, Population ,Data Interpretation, Statistical ,030220 oncology & carcinogenesis ,Quality Score ,Databases, Nucleic Acid ,Sequence Alignment ,Software - Abstract
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.
- Published
- 2011
- Full Text
- View/download PDF
28. Exome sequencing reveals a novel mutation for autosomal recessive non-syndromic mental retardation in the TECR gene on chromosome 19p13
- Author
-
Kiran V. Garimella, Jessica X. Chong, Soma Das, Dan L. Nicolae, Darrel Waggoner, Stacey Gabriel, Mark A. DePristo, Rebecca Anderson, Lawrence H. Uricchio, Carole Ober, Peixian Chen, Carrie Sougnez, Khalid Shakir, Dietrich Matern, and Minal Çalışkan
- Subjects
Male ,Synaptic Membranes ,Locus (genetics) ,Biology ,Intellectual Disability ,Genetics ,medicine ,Humans ,Molecular Biology ,Gene ,Genetics (clinical) ,Exome sequencing ,Membrane Glycoproteins ,Autosome ,Massive parallel sequencing ,Genetic Diseases, Inborn ,Chromosome ,Articles ,General Medicine ,medicine.disease ,Disease gene identification ,Pedigree ,Developmental disorder ,Mutation ,Female ,Oxidoreductases ,Chromosomes, Human, Pair 19 - Abstract
Exome sequencing is a powerful tool for discovery of the Mendelian disease genes. Previously, we reported a novel locus for autosomal recessive non-syndromic mental retardation (NSMR) in a consanguineous family [Nolan, D.K., Chen, P., Das, S., Ober, C. and Waggoner, D. (2008) Fine mapping of a locus for nonsyndromic mental retardation on chromosome 19p13. Am. J. Med. Genet. A, 146A, 1414–1422]. Using linkage and homozygosity mapping, we previously localized the gene to chromosome 19p13. The parents of this sibship were recently included in an exome sequencing project. Using a series of filters, we narrowed the putative causal mutation to a single variant site that segregated with NSMR: the mutation was homozygous in five affected siblings but in none of eight unaffected siblings. This mutation causes a substitution of a leucine for a highly conserved proline at amino acid 182 in TECR (trans-2,3-enoyl-CoA reductase), a synaptic glycoprotein. Our results reveal the value of massively parallel sequencing for identification of novel disease genes that could not be found using traditional approaches and identifies only the seventh causal mutation for autosomal recessive NSMR.
- Published
- 2011
- Full Text
- View/download PDF
29. Exome Sequencing,ANGPTL3Mutations, and Familial Combined Hypolipidemia
- Author
-
Elena Gonzalez, Helen H. Hobbs, Sheila Fisher, Nicholas Rudzicz, Mark A. DePristo, James P. Pirruccello, Pin Yue, Timothy Fennell, Mark J. Daly, Lauren Ambrogio, Jonathan C. Cohen, Candace Guiducci, Kristian Cibulskis, Ron Do, Stacey Gabriel, Sekar Kathiresan, Andrew Kernytsky, Justin Abreu, Andrew Barry, Kiran Musunuru, James C. Engert, Gina M. Peloso, Kiran V. Garimella, Eric Banks, David Altshuler, Gustav Schonfeld, and Carrie Sougnez
- Subjects
Male ,Endothelial lipase ,Genetic Linkage ,DNA Mutational Analysis ,Biology ,Compound heterozygosity ,Article ,Hypobetalipoproteinemias ,chemistry.chemical_compound ,ANGPTL3 ,Humans ,Exome ,Exome sequencing ,Angiopoietin-Like Protein 3 ,Genetics ,Lipoprotein lipase ,Cholesterol ,Cholesterol, HDL ,Cholesterol, LDL ,General Medicine ,Pedigree ,Angiopoietin-like Proteins ,chemistry ,Codon, Nonsense ,Female ,lipids (amino acids, peptides, and proteins) ,Angiopoietins ,Lipoprotein - Abstract
We sequenced all protein-coding regions of the genome (the "exome") in two family members with combined hypolipidemia, marked by extremely low plasma levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides. These two participants were compound heterozygotes for two distinct nonsense mutations in ANGPTL3 (encoding the angiopoietin-like 3 protein). ANGPTL3 has been reported to inhibit lipoprotein lipase and endothelial lipase, thereby increasing plasma triglyceride and HDL cholesterol levels in rodents. Our finding of ANGPTL3 mutations highlights a role for the gene in LDL cholesterol metabolism in humans and shows the usefulness of exome sequencing for identification of novel genetic causes of inherited disorders. (Funded by the National Human Genome Research Institute and others.).
- Published
- 2010
- Full Text
- View/download PDF
30. Low-Complexity Regions in Plasmodium falciparum: Missing Links in the Evolution of an Extreme Genome
- Author
-
Mark A. DePristo, Martine Zilversmit, Philip Awadalla, Dyann F. Wirth, Daniel L. Hartl, and Sarah K. Volkman
- Subjects
Genetics ,Genome evolution ,Plasmodium falciparum ,Population genetics ,Biology ,biology.organism_classification ,medicine.disease ,Genome ,Evolution, Molecular ,Negative selection ,Evolutionary biology ,medicine ,Animals ,Genome, Protozoan ,Molecular Biology ,Gene ,Research Articles ,Ecology, Evolution, Behavior and Systematics ,Malaria ,Sequence (medicine) - Abstract
Over the past decade, attempts to explain the unusual size and prevalence of low-complexity regions (LCRs) in the proteins of the human malaria parasite Plasmodium falciparum have used both neutral and adaptive models. This past research has offered conflicting explanations for LCR characteristics and their role in, and influence on, the evolution of genome structure. Here we show that P. falciparum LCRs (PfLCRs) are not a single phenomenon, but rather consist of at least three distinct types of sequence, and this heterogeneity is the source of the conflict in the literature. Using molecular and population genetics, we show that these families of PfLCRs are evolving by different mechanisms. One of these families, named here the HighGC family, is of particular interest because these LCRs act as recombination hotspots, both in genes under positive selection for high levels of diversity which can be created by recombination (antigens) and those likely to be evolving neutrally or under negative selection (metabolic enzymes). We discuss how the discovery of these distinct species of PfLCRs helps to resolve previous contradictory studies on LCRs in malaria and contributes to our understanding of the evolution of the of the parasite's unusual genome.
- Published
- 2010
- Full Text
- View/download PDF
31. The subtle benefits of being promiscuous: Adaptive evolution potentiated by enzyme promiscuity
- Author
-
Mark A. DePristo
- Subjects
Genetics ,Protein function ,Natural selection ,General Neuroscience ,Computational biology ,Biology ,General Biochemistry, Genetics and Molecular Biology ,Protein evolution ,Genetic drift ,Commentaries ,Gene duplication ,biology.protein ,Enzyme promiscuity ,Adaptation ,Adaptive evolution - Abstract
In this commentary we discuss recent progress in our understanding of adaptive protein evolution. We begin with a general introduction to proteins and their evolution, quickly focusing on the question of how natural selection produces proteins with novel functions. We then summarize the theory of latent protein adaptation advanced by the joint articles by Amitai et al. (2007), HFSP J. 1, 67–78 and Wroe et al. (2007) HFSP J. 1, 79–87, published in the first issue edition of the HFSP Journal. This theory provides a biophysical framework linking the effects of individual mutations on promiscuous protein function, neutral genetic drift, and gene duplication to the process of adaptive protein evolution.
- Published
- 2007
- Full Text
- View/download PDF
32. On the abundance, amino acid composition, and evolutionary dynamics of low-complexity regions in proteins
- Author
-
Martine Zilversmit, Daniel L. Hartl, and Mark A. DePristo
- Subjects
Plasmodium falciparum ,Protozoan Proteins ,Antigens, Protozoan ,Biology ,Evolution, Molecular ,Antigenic Diversity ,Genetics ,Animals ,Amino Acids ,Evolutionary dynamics ,Gene ,Recombination, Genetic ,chemistry.chemical_classification ,DNA replication ,General Medicine ,DNA, Protozoan ,biology.organism_classification ,Antigenic Variation ,Amino acid ,chemistry ,Genome, Protozoan ,Function (biology) ,Recombination ,Microsatellite Repeats - Abstract
Protein sequences frequently contain regions composed of a reduced number of amino acids. Despite their presence in about half of all proteins and their unusual prevalence in the malaria parasite Plasmodium falciparum, the function and evolution of such low-complexity regions (LCRs) remain unclear. Here we show that LCR abundance and amino acid composition depend largely, but not exclusively, on genomic A+T content and obey power–law growth dynamics. Further, our results indicate that LCRs are analogous to microsatellites in that DNA replication slippage and unequal crossover recombination are important molecular mechanisms for LCR expansion. We support this hypothesis by demonstrating that the size of LCR insertions/deletions among orthologous genes depends upon length. Moreover, we show that LCRs enable intra-exonic recombination in a key family of cell-surface antigens in P. falciparum and thus likely facilitate the generation of antigenic diversity. We conclude with a mechanistic model for LCR evolution that links the pattern of LCRs within P. falciparum to its high genomic A+T content and recombination rate.
- Published
- 2006
- Full Text
- View/download PDF
33. Relation between native ensembles and experimental structures of proteins
- Author
-
Kresten Lindorff-Larsen, Robert B. Best, Mark A. DePristo, and Michele Vendruscolo
- Subjects
Models, Molecular ,Quantitative Biology::Biomolecules ,Multidisciplinary ,Chemistry ,Protein dynamics ,Proteins ,Experimental data ,computer.file_format ,Biological Sciences ,Residual ,Protein Data Bank ,Methylation ,Structural heterogeneity ,Protein Structure, Tertiary ,Crystallography ,Native state ,Animals ,Threading (protein sequence) ,Databases, Protein ,Biological system ,Chickens ,Nuclear Magnetic Resonance, Biomolecular ,computer ,Statistical potential - Abstract
Different experimental structures of the same protein or of proteins with high sequence similarity contain many small variations. Here we construct ensembles of “high-sequence similarity Protein Data Bank” (HSP) structures and consider the extent to which such ensembles represent the structural heterogeneity of the native state in solution. We find that different NMR measurements probing structure and dynamics of given proteins in solution, including order parameters, scalar couplings, and residual dipolar couplings, are remarkably well reproduced by their respective high-sequence similarity Protein Data Bank ensembles; moreover, we show that the effects of uncertainties in structure determination are insufficient to explain the results. These results highlight the importance of accounting for native-state protein dynamics in making comparisons with ensemble-averaged experimental data and suggest that even a modest number of structures of a protein determined under different conditions, or with small variations in sequence, capture a representative subset of the true native-state ensemble.
- Published
- 2006
- Full Text
- View/download PDF
34. Simultaneous determination of protein structure and dynamics
- Author
-
Christopher M. Dobson, Michele Vendruscolo, Robert B. Best, Mark A. DePristo, and Kresten Lindorff-Larsen
- Subjects
Models, Molecular ,Quantitative Biology::Biomolecules ,Magnetic Resonance Spectroscopy ,Multidisciplinary ,Protein Conformation ,Ubiquitin ,Stereochemistry ,Chemistry ,Dynamics (mechanics) ,Reproducibility of Results ,Cell Biology ,Nuclear magnetic resonance spectroscopy ,Solutions ,Molecular dynamics ,Protein structure ,Signaling proteins ,Humans ,Computer Simulation ,Biological system ,Native structure - Abstract
We present a protocol for the experimental determination of ensembles of protein conformations that represent simultaneously the native structure and its associated dynamics. The procedure combines the strengths of nuclear magnetic resonance spectroscopy--for obtaining experimental information at the atomic level about the structural and dynamical features of proteins--with the ability of molecular dynamics simulations to explore a wide range of protein conformations. We illustrate the method for human ubiquitin in solution and find that there is considerable conformational heterogeneity throughout the protein structure. The interior atoms of the protein are tightly packed in each individual conformation that contributes to the ensemble but their overall behaviour can be described as having a significant degree of liquid-like character. The protocol is completely general and should lead to significant advances in our ability to understand and utilize the structures of native proteins.
- Published
- 2005
- Full Text
- View/download PDF
35. Heterogeneity and Inaccuracy in Protein Structures Solved by X-Ray Crystallography
- Author
-
Paul I.W. de Bakker, Mark A. DePristo, and Tom L. Blundell
- Subjects
Models, Molecular ,Diffraction ,Scattering ,Chemistry ,Isotropy ,HIV ,Crystal structure ,Crystallography, X-Ray ,Protein Structure, Tertiary ,Crystallography ,Protein structure ,HIV Protease ,Chemical physics ,Structural Biology ,X-ray crystallography ,Degeneracy (biology) ,Anisotropy ,Molecular Biology - Abstract
Proteins are dynamic molecules, exhibiting structural heterogeneity in the form of anisotropic motion and discrete conformational substates, often of functional importance. In protein structure determination by X-ray crystallography, the observed diffraction pattern results from the scattering of X-rays by an ensemble of heterogeneous molecules, ordered and oriented by packing in a crystal lattice. The majority of proteins diffract to resolutions where heterogeneity is difficult to identify and model, and are therefore approximated by a single, average conformation with isotropic variance. Here we show that disregarding structural heterogeneity introduces degeneracy into the structure determination process, as many single, isotropic models exist that explain the diffraction data equally well. The large differences among these models imply that the accuracy of crystallographic structures has been widely overestimated. Further, it suggests that analyses that depend on small differences in the relative positions of atoms may be flawed.
- Published
- 2004
- Full Text
- View/download PDF
36. Advantages of fine-grained side chain conformer libraries
- Author
-
Tom L. Blundell, Paul I.W. de Bakker, Reshma P. Shetty, and Mark A. DePristo
- Subjects
Physics ,Quantitative Biology::Biomolecules ,Protein Conformation ,Ab initio ,Computational Biology ,Bioengineering ,computer.file_format ,Protein Data Bank ,Biochemistry ,Bond length ,Root mean square ,Crystallography ,Protein structure ,Chain (algebraic topology) ,Side chain ,Amino Acids ,Molecular Biology ,computer ,Conformational isomerism ,Biotechnology - Abstract
We compare the modelling accuracy of two common rotamer libraries, the Dunbrack-Cohen and the 'Penultimate' rotamer libraries, with that of a novel library of discrete side chain conformations extracted from the Protein Data Bank. These side chain conformer libraries are extracted automatically from high-quality protein structures using stringent filters and maintain crystallographic bond lengths and angles. This contrasts with traditional rotamer libraries defined in terms of chi angles under the assumption of idealized covalent geometry. We demonstrate that side chain modelling onto native and near-native main chain conformations is significantly more successful with the conformer libraries than with the rotamer libraries when solely considering excluded-volume interactions. The rotamer libraries are inadequate to model side chains without atomic clashes on over 20% of targets if the backbone is held fixed in the native conformation. An algorithm is described for simultaneously modelling both main chain and side chain atoms during discrete ab initio sampling. The resulting models have equivalent root mean square deviations from the experimentally determined protein loops as models from backbone-only ensembles, indicating that all-atom modelling does not detract from the accuracy of conformational sampling.
- Published
- 2003
- Full Text
- View/download PDF
37. Discrete restraint-based protein modeling and the Cα-trace problem
- Author
-
Paul I.W. de Bakker, Reshma P. Shetty, Tom L. Blundell, and Mark A. DePristo
- Subjects
Proteomics ,Physics ,Quantitative Biology::Biomolecules ,Models, Statistical ,Protein Conformation ,Computational Biology ,Centroid ,Models, Theoretical ,Dihedral angle ,Protein structure prediction ,Ligands ,Biochemistry ,Article ,Crystallography ,Protein structure ,Chain (algebraic topology) ,Search algorithm ,Side chain ,Homology modeling ,Statistical physics ,Amino Acids ,Molecular Biology ,Algorithms ,Software ,Protein Binding - Abstract
We present a novel de novo method to generate protein models from sparse, discretized restraints on the conformation of the main chain and side chain atoms. We focus on Calpha-trace generation, the problem of constructing an accurate and complete model from approximate knowledge of the positions of the Calpha atoms and, in some cases, the side chain centroids. Spatial restraints on the Calpha atoms and side chain centroids are supplemented by constraints on main chain geometry, phi/xi angles, rotameric side chain conformations, and inter-atomic separations derived from analyses of known protein structures. A novel conformational search algorithm, combining features of tree-search and genetic algorithms, generates models consistent with these restraints by propensity-weighted dihedral angle sampling. Models with ideal geometry, good phi/xi angles, and no inter-atomic overlaps are produced with 0.8 A main chain and, with side chain centroid restraints, 1.0 A all-atom root-mean-square deviation (RMSD) from the crystal structure over a diverse set of target proteins. The mean model derived from 50 independently generated models is closer to the crystal structure than any individual model, with 0.5 A main chain RMSD under only Calpha restraints and 0.7 A all-atom RMSD under both Calpha and centroid restraints. The method is insensitive to randomly distributed errors of up to 4 A in the Calpha restraints. The conformational search algorithm is efficient, with computational cost increasing linearly with protein size. Issues relating to decoy set generation, experimental structure determination, efficiency of conformational sampling, and homology modeling are discussed.
- Published
- 2003
- Full Text
- View/download PDF
38. Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles
- Author
-
Simon C. Lovell, Paul I.W. de Bakker, Mark A. DePristo, and Tom L. Blundell
- Subjects
Models, Molecular ,Physics ,Quantitative Biology::Biomolecules ,Sequence ,Molecular Structure ,Protein Conformation ,Ab initio ,Proteins ,Reproducibility of Results ,Context (language use) ,Dihedral angle ,Biochemistry ,Peptide Fragments ,Loop (topology) ,Crystallography ,Protein structure ,Structural Biology ,Excluded volume ,Amino Acid Sequence ,Loop modeling ,Statistical physics ,Molecular Biology ,Algorithms - Abstract
We describe a novel method to generate ensembles of conformations of the main-chain atoms [N, C(alpha), C, O, Cbeta] for a sequence of amino acids within the context of a fixed protein framework. Each conformation satisfies fundamental stereo-chemical restraints such as idealized geometry, favorable phi/psi angles, and excluded volume. The ensembles include conformations both near and far from the native structure. Algorithms for effective conformational sampling and constant time overlap detection permit the generation of thousands of distinct conformations in minutes. Unlike previous approaches, our method samples dihedral angles from fine-grained phi/psi state sets, which we demonstrate is superior to exhaustive enumeration from coarse phi/psi sets. Applied to a large set of loop structures, our method samples consistently near-native conformations, averaging 0.4, 1.1, and 2.2 A main-chain root-mean-square deviations for four, eight, and twelve residue long loops, respectively. The ensembles make ideal decoy sets to assess the discriminatory power of a selection method. Using these decoy sets, we conclude that quality of anchor geometry cannot reliably identify near-native conformations, though the selection results are comparable to previous loop prediction methods. In a subsequent study (de Bakker et al.: Proteins 2003;51:21-40), we demonstrate that the AMBER forcefield with the Generalized Born solvation model identifies near-native conformations significantly better than previous methods.
- Published
- 2003
- Full Text
- View/download PDF
39. A global reference for human genetic variation
- Author
-
Colonna V. (1000 Genomes Project Consortium) Adam Auton, Gonçalo R Abecasis, David M Altshuler, Richard M Durbin, David R Bentley, Aravinda Chakravarti, Andrew G Clark, Peter Donnelly, Evan E Eichler, Paul Flicek, Stacey B Gabriel, Richard A Gibbs, Eric D Green, Matthew E Hurles, Bartha M Knoppers, Jan O Korbel, Eric S Lander, Charles Lee, Hans Lehrach, Elaine R Mardis, Gabor T Marth, Gil A McVean, Deborah A Nickerson, Jeanette P Schmidt, Stephen T Sherry, Jun Wang, Richard K Wilson, Eric Boerwinkle, Harsha Doddapaneni, Yi Han, Viktoriya Korchina, Christie Kovar, Sandra Lee, Donna Muzny, Jeffrey G Reid, Yiming Zhu, Yuqi Chang, Qiang Feng, Xiaodong Fang, Xiaosen Guo, Min Jian, Hui Jiang, Xin Jin, Tianming Lan, Guoqing Li, Jingxiang Li, Yingrui Li, Shengmao Liu, Xiao Liu, Yao Lu, Xuedi Ma, Meifang Tang, Bo Wang, Guangbiao Wang, Honglong Wu, Renhua Wu, Xun Xu, Ye Yin, Dandan Zhang, Wenwei Zhang, Jiao Zhao, Meiru Zhao, Xiaole Zheng, Namrata Gupta, Neda Gharani, Lorraine H Toji, Norman P Gerry, Alissa M Resch, Jonathan Barker, Laura Clarke, Laurent Gil, Sarah E Hunt, Gavin Kelman, Eugene Kulesha, Rasko Leinonen, William M McLaren, Rajesh Radhakrishnan, Asier Roa, Dmitriy Smirnov, Richard E Smith, Ian Streeter, Anja Thormann, Iliana Toneva, Brendan Vaughan, Xiangqun Zheng-Bradley, Russell Grocock, Sean Humphray, Terena James, Zoya Kingsbury, Ralf Sudbrak, Marcus W Albrecht, Vyacheslav S Amstislavskiy, Tatiana A Borodina, Matthias Lienhard, Florian Mertes, Marc Sultan, Bernd Timmermann, Marie-Laure Yaspo, Lucinda Fulton, Robert Fulton, Victor Ananiev, Zinaida Belaia, Dimitriy Beloslyudtsev, Nathan Bouk, Chao Chen, Deanna Church, Robert Cohen, Charles Cook, John Garner, Timothy Hefferon, Mikhail Kimelman, Chunlei Liu, John Lopez, Peter Meric, Chris O'Sullivan, Yuri Ostapchuk, Lon Phan, Sergiy Ponomarov, Valerie Schneider, Eugene Shekhtman, Karl Sirotkin, Douglas Slotta, Hua Zhang, Senduran Balasubramaniam, John Burton, Petr Danecek, Thomas M Keane, Anja Kolb-Kokocinski, Shane McCarthy, James Stalker, Michael Quail, Christopher J Davies, Jeremy Gollub, Teresa Webster, Brant Wong, Yiping Zhan, Adam Auton, Christopher L Campbell, Yu Kong, Anthony Marcketta, Fuli Yu, Lilian Antunes, Matthew Bainbridge, Aniko Sabo, Zhuoyi Huang, Lachlan J M Coin, Lin Fang, Qibin Li, Zhenyu Li, Haoxiang Lin, Binghang Liu, Ruibang Luo, Haojing Shao, Yinlong Xie, Chen Ye, Chang Yu, Fan Zhang, Hancheng Zheng, Hongmei Zhu, Can Alkan, Elif Dal, Fatma Kahveci, Erik P Garrison, Deniz Kural, Wan-Ping Lee, Wen Fung Leong, Michael Stromberg, Alistair N Ward, Jiantao Wu, Mengyao Zhang, Mark J Daly, Mark A DePristo, Robert E Handsaker, Eric Banks, Gaurav Bhatia, Guillermo Del Angel, Giulio Genovese, Heng Li, Seva Kashin, Steven A McCarroll, James C Nemesh, Ryan E Poplin, Seungtai C Yoon, Jayon Lihm, Vladimir Makarov, Srikanth Gottipati, Alon Keinan, Juan L Rodriguez-Flores, Tobias Rausch, Markus H Fritz, Adrian M Stütz, Kathryn Beal, Avik Datta, Javier Herrero, Graham R S Ritchie, Daniel Zerbino, Pardis C Sabeti, Ilya Shlyakhter, Stephen F Schaffner, Joseph Vitti, David N Cooper, Edward V Ball, Peter D Stenson, Bret Barnes, Markus Bauer, R Keira Cheetham, Anthony Cox, Michael Eberle, Scott Kahn, Lisa Murray, John Peden, Richard Shaw, Eimear E Kenny, Mark A Batzer, Miriam K Konkel, Jerilyn A Walker, Daniel G MacArthur, Monkol Lek, Ralf Herwig, Li Ding, Daniel C Koboldt, David Larson, Kai Ye, Simon Gravel, Anand Swaroop, Emily Chew, Tuuli Lappalainen, Yaniv Erlich, Melissa Gymrek, Thomas Frederick Willems, Jared T Simpson, Mark D Shriver, Jeffrey A Rosenfeld, Carlos D Bustamante, Stephen B Montgomery, Francisco M De La Vega, Jake K Byrnes, Andrew W Carroll, Marianne K DeGorter, Phil Lacroute, Brian K Maples, Alicia R Martin, Andres Moreno-Estrada, Suyash S Shringarpure, Fouad Zakharia, Eran Halperin, Yael Baran, Eliza Cerveira, Jaeho Hwang, Ankit Malhotra, Dariusz Plewczynski, Kamen Radew, Mallory Romanovitch, Chengsheng Zhang, Fiona C L Hyland, David W Craig, Alexis Christoforides, Nils Homer, Tyler Izatt, Ahmet A Kurdoglu, Shripad A Sinari, Kevin Squire, Chunlin Xiao, Jonathan Sebat, Danny Antaki, Madhusudan Gujral, Amina Noor, Kenny Ye, Esteban G Burchard, Ryan D Hernandez, Christopher R Gignoux, David Haussler, Sol J Katzman, W James Kent, Bryan Howie, Andres Ruiz-Linares, Emmanouil T Dermitzakis, Scott E Devine, Hyun Min Kang, Jeffrey M Kidd, Tom Blackwell, Sean Caron, Wei Chen, Sarah Emery, Lars Fritsche, Christian Fuchsberger, Goo Jun, Bingshan Li, Robert Lyons, Chris Scheller, Carlo Sidore, Shiya Song, Elzbieta Sliwerska, Daniel Taliun, Adrian Tan, Ryan Welch, Mary Kate Wing, Xiaowei Zhan, Philip Awadalla, Alan Hodgkinson, Yun Li, Xinghua Shi, Andrew Quitadamo, Gerton Lunter, Jonathan L Marchini, Simon Myers, Claire Churchhouse, Olivier Delaneau, Anjali Gupta-Hinch, Warren Kretzschmar, Zamin Iqbal, Iain Mathieson, Androniki Menelaou, Andy Rimmer, Dionysia K Xifara, Taras K Oleksyk, Yunxin Fu, Xiaoming Liu, Momiao Xiong, Lynn Jorde, David Witherspoon, Jinchuan Xing, Brian L Browning, Sharon R Browning, Fereydoun Hormozdiari, Peter H Sudmant, Ekta Khurana, Chris Tyler-Smith, Cornelis A Albers, Qasim Ayub, Yuan Chen, Vincenza Colonna, Luke Jostins, Klaudia Walter, Yali Xue, Mark B Gerstein, Alexej Abyzov, Suganthi Balasubramanian, Jieming Chen, Declan Clarke, Yao Fu, Arif O Harmanci, Mike Jin, Donghoon Lee, Jeremy Liu, Xinmeng Jasmine Mu, Jing Zhang, Yan Zhang, Chris Hartl, Khalid Shakir, Jeremiah Degenhardt, Sascha Meiers, Benjamin Raeder, Francesco Paolo Casale, Oliver Stegle, Eric-Wubbo Lameijer, Ira Hall, Vineet Bafna, Jacob Michaelson, Eugene J Gardner, Ryan E Mills, Gargi Dayama, Ken Chen, Xian Fan, Zechen Chong, Tenghui Chen, Mark J Chaisson, John Huddleston, Maika Malig, Bradley J Nelson, Nicholas F Parrish, Ben Blackburne, Sarah J Lindsay, Zemin Ning, Yujun Zhang, Hugo Lam, Cristina Sisu, Danny Challis, Uday S Evani, James Lu, Uma Nagaswamy, Jin Yu, Wangshen Li, Lukas Habegger, Haiyuan Yu, Fiona Cunningham, Ian Dunham, Kasper Lage, Jakob Berg Jespersen, Heiko Horn, Donghoon Kim, Rob Desalle, Apurva Narechania, Melissa A Wilson Sayres, Fernando L Mendez, G David Poznik, Peter A Underhill, Lachlan Coin, David Mittelman, Ruby Banerjee, Maria Cerezo, Thomas W Fitzgerald, Sandra Louzada, Andrea Massaia, Graham R Ritchie, Fengtang Yang, Divya Kalra, Walker Hale, Xu Dan, Kathleen C Barnes, Christine Beiswanger, Hongyu Cai, Hongzhi Cao, Brenna Henn, Danielle Jones, Jane S Kaye, Alastair Kent, Angeliki Kerasidou, Rasika Mathias, Pilar N Ossorio, Michael Parker, Charles N Rotimi, Charmaine D Royal, Karla Sandoval, Yeyang Su, Zhongming Tian, Sarah Tishkoff, Marc Via, Yuhong Wang, Huanming Yang, Ling Yang, Jiayong Zhu, Walter Bodmer, Gabriel Bedoya, Zhiming Cai, Yang Gao, Jiayou Chu, Leena Peltonen, Andres Garcia-Montero, Alberto Orfao, Julie Dutil, Juan C Martinez-Cruzado, Rasika A Mathias, Anselm Hennis, Harold Watson, Colin McKenzie, Firdausi Qadri, Regina LaRocque, Xiaoyan Deng, Danny Asogun, Onikepe Folarin, Christian Happi, Omonwunmi Omoniwa, Matt Stremlau, Ridhi Tariyal, Muminatou Jallow, Fatoumatta Sisay Joof, Tumani Corrah, Kirk Rockett, Dominic Kwiatkowski, Jaspal Kooner, Trân T?nh Hiên, Sarah J Dunstan, Nguyen Thuy Hang, Richard Fonnie, Robert Garry, Lansana Kanneh, Lina Moses, John Schieffelin, Donald S Grant, Carla Gallo, Giovanni Poletti, Danish Saleheen, Asif Rasheed, Lisa D Brooks, Adam L Felsenfeld, Jean E McEwen, Yekaterina Vaydylevich, Audrey Duncanson, Michael Dunn, Jeffery A Schloss, 1000 Genomes Project Consortium, Institute for Medical Engineering and Science, Broad Institute of MIT and Harvard, Lincoln Laboratory, Massachusetts Institute of Technology. Department of Biology, Gabriel, Stacey, Lander, Eric Steven, Daly, Mark J, Banks, Eric, Bhatia, Gaurav, Kashin, Seva, McCarroll, Steven A, Nemesh, James, Poplin, Ryan E., Sabeti, Pardis, Shlyakhter, Ilya, Schaffner, Stephen F, Vitti, Joseph, Gymrek, Melissa A, Hartler, Christina M., and Tariyal, Ridhi
- Subjects
demography ,genetic association ,genotype ,Human genomics ,Genome-wide association study ,Review ,SUSCEPTIBILITY ,DISEASE ,polymorphism ,0302 clinical medicine ,quantitative trait locus ,INDEL Mutation ,genetics ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,MUTATION ,Exome sequencing ,0303 health sciences ,public health ,Sequence analysis ,High-Throughput Nucleotide Sequencing ,standard ,Genomics ,Reference Standards ,Physical Chromosome Mapping ,3. Good health ,priority journal ,Science & Technology - Other Topics ,BAYES FACTORS ,Molecular Developmental Biology ,Genotype ,Genetics, Medical ,Quantitative Trait Loci ,DNA sequence ,rare disease ,human genetics ,information processing ,Article ,03 medical and health sciences ,SDG 3 - Good Health and Well-being ,POPULATION HISTORY ,human genome ,Humans ,retroposon ,Genetic variability ,human ,GENOME-WIDE ASSOCIATION ,1000 Genomes Project ,Demography ,Science & Technology ,ancestry ,disease predisposition ,Genetic Variation ,MACULAR DEGENERATION ,major clinical study ,gene linkage disequilibrium ,purl.org/pe-repo/ocde/ford#3.01.02 [https] ,Genetics, Population ,030217 neurology & neurosurgery ,haplotype ,Internationality ,VARIANT ,Datasets as Topic ,Human genetic variation ,COMPLEMENT FACTOR-H ,single nucleotide polymorphism ,genetic variability ,Exome ,chromosome map ,Genetics ,Variant Call Format ,Genome ,Multidisciplinary ,1000 Genomes Project Consortium ,international cooperation ,Multidisciplinary Sciences ,standards ,Disease Susceptibility ,medical genetics ,General Science & Technology ,Population ,Computational biology ,Biology ,gene frequency ,Polymorphism, Single Nucleotide ,high throughput sequencing ,Rare Diseases ,promoter region ,MD Multidisciplinary ,Genetic variation ,QH426 ,030304 developmental biology ,Neurodevelopmental disorders Donders Center for Medical Neuroscience [Radboudumc 7] ,Genome, Human ,population genetics ,population structure ,Sequence Analysis, DNA ,gene structure ,INDIVIDUALS ,Haplotypes ,Genome-Wide Association Study ,purl.org/pe-repo/ocde/ford#1.06.07 [https] - Abstract
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies., Wellcome Trust (London, England) (Core Award 090532/Z/09/Z), Wellcome Trust (London, England) (Senior Investigator Award 095552/Z/11/Z ), Wellcome Trust (London, England) (WT095908), Wellcome Trust (London, England) (WT109497), Wellcome Trust (London, England) (WT098051), Wellcome Trust (London, England) (WT086084/Z/08/Z), Wellcome Trust (London, England) (WT100956/Z/13/Z ), Wellcome Trust (London, England) (WT097307), Wellcome Trust (London, England) (WT0855322/Z/08/Z ), Wellcome Trust (London, England) (WT090770/Z/09/Z ), Wellcome Trust (London, England) (Major Overseas program in Vietnam grant 089276/Z.09/Z), Medical Research Council (Great Britain) (grant G0801823), Biotechnology and Biological Sciences Research Council (Great Britain) (grant BB/I02593X/1), Biotechnology and Biological Sciences Research Council (Great Britain) (grant BB/I021213/1), Zhongguo ke xue ji shu qing bao yan jiu suo. Office of 863 Programme of China (2012AA02A201), National Basic Research Program of China (2011CB809201), National Basic Research Program of China (2011CB809202), National Basic Research Program of China (2011CB809203), National Natural Science Foundation of China (31161130357), Shenzhen Municipal Government of China (grant ZYC201105170397A), Canadian Institutes of Health Research (grant 136855), Quebec Ministry of Economic Development, Innovation, and Exports (PSR-SIIRI-195), Germany. Bundesministerium für Bildung und Forschung (0315428A), Germany. Bundesministerium für Bildung und Forschung (01GS08201), Germany. Bundesministerium für Bildung und Forschung (BMBF-EPITREAT grant 0316190A), Deutsche Forschungsgemeinschaft (Emmy Noether Grant KO4037/1-1), Beatriu de Pinos Program (2006 BP-A 10144), Beatriu de Pinos Program (2009 BP-B 00274), Spanish National Institute for Health (grant PRB2 IPT13/0001-ISCIII-SGEFI/FEDER), Japan Society for the Promotion of Science (fellowship number PE13075), Marie Curie Actions Career Integration (grant 303772), Fonds National Suisse del la Recherche, SNSF, Scientifique (31003A_130342), National Center for Biotechnology Information (U.S.) (U54HG3067), National Center for Biotechnology Information (U.S.) (U54HG3273), National Center for Biotechnology Information (U.S.) (U01HG5211), National Center for Biotechnology Information (U.S.) (U54HG3079), National Center for Biotechnology Information (U.S.) (R01HG2898), National Center for Biotechnology Information (U.S.) (R01HG2385), National Center for Biotechnology Information (U.S.) (RC2HG5552), National Center for Biotechnology Information (U.S.) (U01HG6513), National Center for Biotechnology Information (U.S.) (U01HG5214), National Center for Biotechnology Information (U.S.) (U01HG5715), National Center for Biotechnology Information (U.S.) (U01HG5718), National Center for Biotechnology Information (U.S.) (U01HG5728), National Center for Biotechnology Information (U.S.) (U41HG7635), National Center for Biotechnology Information (U.S.) (U41HG7497), National Center for Biotechnology Information (U.S.) (R01HG4960), National Center for Biotechnology Information (U.S.) (R01HG5701), National Center for Biotechnology Information (U.S.) (R01HG5214), National Center for Biotechnology Information (U.S.) (R01HG6855), National Center for Biotechnology Information (U.S.) (R01HG7068), National Center for Biotechnology Information (U.S.) (R01HG7644), National Center for Biotechnology Information (U.S.) (DP2OD6514), National Center for Biotechnology Information (U.S.) (DP5OD9154), National Center for Biotechnology Information (U.S.) (R01CA166661), National Center for Biotechnology Information (U.S.) (R01CA172652), National Center for Biotechnology Information (U.S.) (P01GM99568), National Center for Biotechnology Information (U.S.) (R01GM59290), National Center for Biotechnology Information (U.S.) (R01GM104390), National Center for Biotechnology Information (U.S.) (T32GM7790), National Center for Biotechnology Information (U.S.) (R01HL87699), National Center for Biotechnology Information (U.S.) (R01HL104608), National Center for Biotechnology Information (U.S.) (T32HL94284), National Center for Biotechnology Information (U.S.) (HHSN268201100040C), National Center for Biotechnology Information (U.S.) (HHSN272201000025C), Lundbeck Foundation (grant R170-2014-1039, Simons Foundation (SFARI award SF51), National Science Foundation (U.S.) (Research Fellowship DGE-1147470)
- Published
- 2015
- Full Text
- View/download PDF
40. Mutational Reversions During Adaptive Protein Evolution
- Author
-
Daniel M. Weinreich, Mark A. DePristo, and Daniel L. Hartl
- Subjects
Genetics ,Natural selection ,Models, Genetic ,Adaptation, Biological ,Biology ,beta-Lactamases ,Protein evolution ,Evolution, Molecular ,Fixation (population genetics) ,Evolutionary biology ,Mutation ,Humans ,Selection, Genetic ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics - Abstract
Adaptation is often regarded as the sequential fixation of individually, intrinsically beneficial mutations. Contrary to this expectation, we find a surprisingly large number of evolutionary trajectories on which natural selection first favors a mutation, then favors its removal, and later still favors its ultimate restoration during the course of antibiotic resistance evolution. The existence of reversion trajectories implies that natural selection may not follow the most parsimonious path separating two alleles, even during adaptation. Altogether, this discovery highlights the unusual and potentially circuitous routes natural selection can follow during adaptation.
- Published
- 2007
- Full Text
- View/download PDF
41. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol
- Author
-
Leslie A. Lange, Youna Hu, He Zhang, Chenyi Xue, Ellen M. Schmidt, Zheng-Zheng Tang, Chris Bizon, Ethan M. Lange, Joshua D. Smith, Emily H. Turner, Goo Jun, Hyun Min Kang, Gina Peloso, Paul Auer, Kuo-ping Li, Jason Flannick, Ji Zhang, Christian Fuchsberger, Kyle Gaulton, Cecilia Lindgren, Adam Locke, Alisa Manning, Xueling Sim, Manuel A. Rivas, Oddgeir L. Holmen, Omri Gottesman, Yingchang Lu, Douglas Ruderfer, Eli A. Stahl, Qing Duan, Yun Li, Peter Durda, Shuo Jiao, Aaron Isaacs, Albert Hofman, Joshua C. Bis, Adolfo Correa, Michael E. Griswold, Johanna Jakobsdottir, Albert V. Smith, Pamela J. Schreiner, Mary F. Feitosa, Qunyuan Zhang, Jennifer E. Huffman, Jacy Crosby, Christina L. Wassel, Ron Do, Nora Franceschini, Lisa W. Martin, Jennifer G. Robinson, Themistocles L. Assimes, David R. Crosslin, Elisabeth A. Rosenthal, Michael Tsai, Mark J. Rieder, Deborah N. Farlow, Aaron R. Folsom, Thomas Lumley, Ervin R. Fox, Christopher S. Carlson, Ulrike Peters, Rebecca D. Jackson, Cornelia M. van Duijn, André G. Uitterlinden, Daniel Levy, Jerome I. Rotter, Herman A. Taylor, Vilmundur Gudnason, David S. Siscovick, Myriam Fornage, Ingrid B. Borecki, Caroline Hayward, Igor Rudan, Y. Eugene Chen, Erwin P. Bottinger, Ruth J.F. Loos, Pål Sætrom, Kristian Hveem, Michael Boehnke, Leif Groop, Mark McCarthy, Thomas Meitinger, Christie M. Ballantyne, Stacey B. Gabriel, Christopher J. O’Donnell, Wendy S. Post, Kari E. North, Alexander P. Reiner, Eric Boerwinkle, Bruce M. Psaty, David Altshuler, Sekar Kathiresan, Dan-Yu Lin, Gail P. Jarvik, L. Adrienne Cupples, Charles Kooperberg, James G. Wilson, Deborah A. Nickerson, Goncalo R. Abecasis, Stephen S. Rich, Russell P. Tracy, Cristen J. Willer, David M. Altshuler, Gonçalo R. Abecasis, Hooman Allayee, Sharon Cresci, Mark J. Daly, Paul I.W. de Bakker, Mark A. DePristo, Peter Donnelly, Tim Fennell, Kiran Garimella, Stanley L. Hazen, Daniel M. Jordan, Adam Kiezun, Guillaume Lettre, Bingshan Li, Mingyao Li, Christopher H. Newton-Cheh, Sandosh Padmanabhan, Sara Pulit, Daniel J. Rader, David Reich, Muredach P. Reilly, Steve Schwartz, Laura Scott, John A. Spertus, Nathaniel O. Stitziel, Nina Stoletzki, Shamil R. Sunyaev, Benjamin F. Voight, Ermeg Akylbekova, Larry D. Atwood, Maja Barbalic, R. Graham Barr, Emelia J. Benjamin, Joshua Bis, Donald W. Bowden, Jennifer Brody, Matthew Budoff, Greg Burke, Sarah Buxbaum, Jeff Carr, Donna T. Chen, Ida Y. Chen, Wei-Min Chen, Pat Concannon, Ralph D’Agostino, Anita L. DeStefano, Albert Dreisbach, Josée Dupuis, J. Peter Durda, Jaclyn Ellis, Caroline S. Fox, Ervin Fox, Vincent Funari, Santhi K. Ganesh, Julius Gardin, David Goff, Ora Gordon, Wayne Grody, Myron Gross, Xiuqing Guo, Ira M. Hall, Nancy L. Heard-Costa, Susan R. Heckbert, Nicholas Heintz, David M. Herrington, DeMarc Hickson, Jie Huang, Shih-Jen Hwang, David R. Jacobs, Nancy S. Jenny, Andrew D. Johnson, Craig W. Johnson, Steven Kawut, Richard Kronmal, Raluca Kurz, Martin G. Larson, Mark Lawson, Cora E. Lewis, Dalin Li, Honghuang Lin, Chunyu Liu, Jiankang Liu, Kiang Liu, Xiaoming Liu, Yongmei Liu, William T. Longstreth, Cay Loria, Kathryn Lunetta, Aaron J. Mackey, Rachel Mackey, Ani Manichaikul, Taylor Maxwell, Barbara McKnight, James B. Meigs, Alanna C. Morrison, Solomon K. Musani, Josyf C. Mychaleckyj, Jennifer A. Nettleton, Kari North, Daniel O’Leary, Frank Ong, Walter Palmas, James S. Pankow, Nathan D. Pankratz, Shom Paul, Marco Perez, Sharina D. Person, Joseph Polak, Aaron R. Quinlan, Leslie J. Raffel, Vasan S. Ramachandran, Kenneth Rice, Jill P. Sanders, Pamela Schreiner, Sudha Seshadri, Steve Shea, Stephen Sidney, Kevin Silverstein, Nicholas L. Smith, Nona Sotoodehnia, Asoke Srinivasan, Kent Taylor, Fridtjof Thomas, Michael Y. Tsai, Kelly A. Volcik, Chrstina L. Wassel, Karol Watson, Gina Wei, Wendy White, Kerri L. Wiggins, Jemma B. Wilk, O. Dale Williams, Gregory Wilson, Phillip Wolf, Neil A. Zakai, John Hardy, James F. Meschia, Michael Nalls, Andrew Singleton, Brad Worrall, Michael J. Bamshad, Kathleen C. Barnes, Ibrahim Abdulhamid, Frank Accurso, Ran Anbar, Terri Beaty, Abigail Bigham, Phillip Black, Eugene Bleecker, Kati Buckingham, Anne Marie Cairns, Daniel Caplan, Barbara Chatfield, Aaron Chidekel, Michael Cho, David C. Christiani, James D. Crapo, Julia Crouch, Denise Daley, Anthony Dang, Hong Dang, Alicia De Paula, Joan DeCelie-Germana, Allen DozorMitch Drumm, Maynard Dyson, Julia Emerson, Mary J. Emond, Thomas Ferkol, Robert Fink, Cassandra Foster, Deborah Froh, Li Gao, William Gershan, Ronald L. Gibson, Elizabeth Godwin, Magdalen Gondor, Hector Gutierrez, Nadia N. Hansel, Paul M. Hassoun, Peter Hiatt, John E. Hokanson, Michelle Howenstine, Laura K. Hummer, Jamshed Kanga, Yoonhee Kim, Michael R. Knowles, Michael Konstan, Thomas Lahiri, Nan Laird, Christoph Lange, Lin Lin, Xihong Lin, Tin L. Louie, David Lynch, Barry Make, Thomas R. Martin, Steve C. Mathai, Rasika A. Mathias, John McNamara, Sharon McNamara, Deborah Meyers, Susan Millard, Peter Mogayzel, Richard Moss, Tanda Murray, Dennis Nielson, Blakeslee Noyes, Wanda O’Neal, David Orenstein, Brian O’Sullivan, Rhonda Pace, Peter Pare, H. Worth Parker, Mary Ann Passero, Elizabeth Perkett, Adrienne Prestridge, Nicholas M. Rafaels, Bonnie Ramsey, Elizabeth Regan, Clement Ren, George Retsch-Bogart, Michael Rock, Antony Rosen, Margaret Rosenfeld, Ingo Ruczinski, Andrew Sanford, David Schaeffer, Cindy Sell, Daniel Sheehan, Edwin K. Silverman, Don Sin, Terry Spencer, Jackie Stonebraker, Holly K. Tabor, Laurie Varlotta, Candelaria I. Vergara, Robert Weiss, Fred Wigley, Robert A. Wise, Fred A. Wright, Mark M. Wurfel, Robert Zanni, Fei Zou, Phil Green, Jay Shendure, Joshua M. Akey, Carlos D. Bustamante, Evan E. Eichler, P. Keolu Fox, Wenqing Fu, Adam Gordon, Simon Gravel, Jill M. Johnsen, Mengyuan Kan, Eimear E. Kenny, Jeffrey M. Kidd, Fremiet Lara-Garduno, Suzanne M. Leal, Dajiang J. Liu, Sean McGee, Timothy D. O’Connor, Bryan Paeper, Peggy D. Robertson, Jeffrey C. Staples, Jacob A. Tennessen, Gao Wang, Qian Yi, Rebecca Jackson, Garnet Anderson, Hoda Anton-Culver, Paul L. Auer, Shirley Beresford, Henry Black, Robert Brunner, Robert Brzyski, Dale Burwen, Bette Caan, Cara L. Carty, Rowan Chlebowski, Steven Cummings, J. David Curb, Charles B. Eaton, Leslie Ford, Stephanie M. Fullerton, Margery Gass, Nancy Geller, Gerardo Heiss, Barbara V. Howard, Li Hsu, Carolyn M. Hutter, John Ioannidis, Karen C. Johnson, Lewis Kuller, Andrea LaCroix, Kamakshi Lakshminarayan, Dorothy Lane, Norman Lasser, Erin LeBlanc, Kuo-Ping Li, Marian Limacher, Benjamin A. Logsdon, Shari Ludlam, JoAnn E. Manson, Karen Margolis, Lisa Martin, Joan McGowan, Keri L. Monda, Jane Morley Kotchen, Lauren Nathan, Judith Ockene, Mary Jo O’Sullivan, Lawrence S. Phillips, Ross L. Prentice, John Robbins, Jacques E. Rossouw, Haleh Sangi-Haghpeykar, Gloria E. Sarto, Sally Shumaker, Michael S. Simon, Marcia L. Stefanick, Evan Stein, Hua Tang, Kira C. Taylor, Cynthia A. Thomson, Timothy A. Thornton, Linda Van Horn, Mara Vitolins, Jean Wactawski-Wende, Robert Wallace, Sylvia Wassertheil-Smoller, Donglin Zeng, Deborah Applebaum-Bowden, Michael Feolo, Weiniu Gan, Dina N. Paltoo, Phyliss Sholinsky, Anne Sturcke, Epidemiology, and Internal Medicine
- Subjects
Male ,Genome-wide association study ,030204 cardiovascular system & hematology ,Cohort Studies ,0302 clinical medicine ,Gene Frequency ,Receptors ,Genotype ,Dyslipidemias/blood ,Receptors, LDL/genetics ,Genetics(clinical) ,Exome ,Genetics (clinical) ,Exome sequencing ,Genetics ,0303 health sciences ,Serine Endopeptidases ,Single Nucleotide ,Middle Aged ,3. Good health ,Cholesterol ,Phenotype ,Genetic Code ,Cholesterol, LDL/genetics ,Female ,lipids (amino acids, peptides, and proteins) ,Proprotein Convertases ,Proprotein Convertase 9 ,Sequence Analysis ,Adult ,Apolipoproteins E/blood ,LDL/genetics ,Serine Endopeptidases/genetics ,Single-nucleotide polymorphism ,Biology ,Polymorphism, Single Nucleotide ,Article ,03 medical and health sciences ,Apolipoproteins E ,SDG 3 - Good Health and Well-being ,Humans ,Polymorphism ,Allele frequency ,030304 developmental biology ,Genetic association ,Aged ,Dyslipidemias ,PCSK9 ,DNA ,Cholesterol, LDL ,Lipase ,Sequence Analysis, DNA ,Receptors, LDL ,Lipase/genetics ,Proprotein Convertases/genetics ,Follow-Up Studies ,Genome-Wide Association Study - Abstract
Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or
- Published
- 2014
42. A polygenic burden of rare disruptive mutations in schizophrenia
- Author
-
Panos Roussos, Steven A. McCarroll, Pamela Sklar, Stephen J. Haggarty, Eric Banks, Stacey Gabriel, Colm O'Dushlaine, Edward M. Scolnick, Mark O. Collins, Jyoti S. Choudhary, Menachem Fromer, Anna K. Kähler, Christina M. Hultman, Sarah E. Bergen, Shaun Purcell, Patrik K. E. Magnusson, Kiran V. Garimella, Esperanza Fernández, Seth G. N. Grant, Kimberly Chambert, Nadia Solovieff, Khalid Shakir, Laramie E. Duncan, Noboru H. Komiyama, Eric S. Lander, Eli A. Stahl, Timothy Fennell, Giulio Genovese, Douglas M. Ruderfer, Patrick F. Sullivan, Mark A. DePristo, Jennifer L. Moran, Massachusetts Institute of Technology. Department of Biology, and Lander, Eric S.
- Subjects
Male ,Multifactorial Inheritance ,DNA Copy Number Variations ,Population ,Neurogenetics ,Genome-wide association study ,Nerve Tissue Proteins ,Biology ,Receptors, N-Methyl-D-Aspartate ,Article ,03 medical and health sciences ,Fragile X Mental Retardation Protein ,0302 clinical medicine ,Intellectual Disability ,Humans ,Allele ,Autistic Disorder ,education ,Exome ,Exome sequencing ,030304 developmental biology ,Genetic association ,Genetics ,0303 health sciences ,education.field_of_study ,Multidisciplinary ,Intracellular Signaling Peptides and Proteins ,Membrane Proteins ,FMR1 ,3. Good health ,Cytoskeletal Proteins ,Mutation ,Schizophrenia ,Female ,Calcium Channels ,Disks Large Homolog 4 Protein ,030217 neurology & neurosurgery ,Genome-Wide Association Study - Abstract
Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease., National Human Genome Research Institute (U.S.) (Grant U54HG003067)
- Published
- 2014
43. Loss-of-Function Mutations in APOC3, Triglycerides, and Coronary Disease
- Author
-
Stavroula Kanoni, Olle Melander, He Zhang, Omri Gottesman, Yingchang Lu, Nathan O. Stitziel, Caroline S. Fox, Stephen S. Rich, Ruth J. F. Loos, Jochen Kruppa, Bruce M. Psaty, Kathleen Stirrups, Jose M. Ordovas, Nora Franceschini, Majid Nikpay, Christie M. Ballantyne, Gonçalo R. Abecasis, Natalie R. van Zuydam, Dermot F. Reilly, Werner Koch, Kari E. North, George Hindy, Jacy R Crosby, Deborah N. Farlow, Nicholas G. D. Masca, Chenyi Xue, Leslie A. Lange, David R. Crosslin, Lisa W. Martin, Paul L. Auer, Oliviero Olivieri, Domenico Girelli, Russell P. Tracy, David Altshuler, Nicola Martinelli, Cristen J. Willer, Ron Do, Heribert Schunkert, Oddgeir L. Holmen, Gina M. Peloso, Sekar Kathiresan, James G. Wilson, Hyun Min Kang, Pier Angelica Merlini, Stefano Duga, Charles Kooperberg, Zheng-Zheng Tang, Gail P. Jarvik, Anuj Goel, Kristian Hveem, Alistair S. Hall, Thomas Meitinger, Marju Orho-Melander, Diego Ardissino, Themistocles L. Assimes, Rosanna Asselta, Elizabeth K. Speliotes, Franziska Degenhardt, Colin N. A. Palmer, Goo Jun, Youna Hu, Martin Farrall, Ruth McPherson, Stefan A. Escher, Mark A. DePristo, Namrata Gupta, Nicholas J. Wareham, L. Adrienne Cupples, Annette Peters, Danyu Lin, Raimund Erbel, Wu Yin, Jan-Håkan Jansson, Wolfgang Lieb, Hugh Watkins, Stacey Gabriel, Nilesh J. Samani, Inke R. König, Jennifer G. Robinson, Erwin P. Bottinger, Deborah A. Nickerson, Jeanette Erdmann, Christopher J. O'Donnell, Panos Deloukas, Eric Boerwinkle, Alexander P. Reiner, and Erbel, Raimund (Beitragende*r)
- Subjects
medicine.medical_specialty ,Nonsense mutation ,Medizin ,Endocrinology and Diabetes ,medicine.disease_cause ,Article ,Internal medicine ,Genotype ,Medicine ,Missense mutation ,Cardiac and Cardiovascular Systems ,Exome ,triglycerides ,Exome sequencing ,Genetics ,Mutation ,Apolipoprotein C-III ,apolipoproteins ,apo C3 ,business.industry ,General Medicine ,Odds ratio ,Endocrinology ,Apolipoprotein C3 ,coronary artery disease ,business - Abstract
Background Plasma triglyceride levels are heritable and are correlated with the risk of coronary heart disease. Sequencing of the protein-coding regions of the human genome (the exome) has the potential to identify rare mutations that have a large effect on phenotype. Methods We sequenced the protein-coding regions of 18,666 genes in each of 3734 participants of European or African ancestry in the Exome Sequencing Project. We conducted tests to determine whether rare mutations in coding sequence, individually or in aggregate within a gene, were associated with plasma triglyceride levels. For mutations associated with triglyceride levels, we subsequently evaluated their association with the risk of coronary heart disease in 110,970 persons. Results An aggregate of rare mutations in the gene encoding apolipoprotein C3 (APOC3) was associated with lower plasma triglyceride levels. Among the four mutations that drove this result, three were loss-of-function mutations: a nonsense mutation (R19X) and two splice-site mutations (IVS2+1G -> A and IVS3+1G -> T). The fourth was a missense mutation (A43T). Approximately 1 in 150 persons in the study was a heterozygous carrier of at least one of these four mutations. Triglyceride levels in the carriers were 39% lower than levels in noncarriers (P
- Published
- 2014
44. Simulation of Finnish population history, guided by empirical genetic data, to assess power of rare-variant tests in Finland
- Author
-
Mark A. DePristo, Gil McVean, Jacquelyn Murphy, Kyle J. Gaulton, Jasmina Kravic, Martin Hrabé de Angelis, Noël P. Burtt, Ryan Poplin, Jeff Chen, Peter S. Chines, Tiinamaija Toumi, David Buck, Khalid Shakir, Pierre Fontanillas, Richard N. Bergman, Neil Robertson, Leif Groop, Tim M. Strom, Martina Müller-Nurasyid, Loukas Moutsianas, Christian Fuchsberger, Charleston W. K. Chiang, Thomas Meitinger, Jennifer Kriebel, Richard D. Pearson, Christa Meisinger, Gonçalo R. Abecasis, Jaakko Tuomilehto, Ashok Kumar, Janina S. Ried, Anubha Mahajan, Andrew T. Hattersley, Adam E. Locke, Wolfgang Rathmann, Inga Prokopenko, Adrian Tan, Annette Peters, Lori L. Bonnycastle, Andrew D. Morris, Christopher Hartl, Peter Donnelly, Vineeta Agarwala, Will Rayner, Jason Flannick, Timothy M. Frayling, Cornelia Huth, Martijn van de Bunt, David Altshuler, Harald Grallert, Stacey Gabriel, Michael Boehnke, Laura J. Scott, Tim D. Spector, Thomas W. Blackwell, Heather M. Stringham, Phoenix Kwan, Alisa Manning, Tanya M. Teslovich, Sophie R. Wang, Jeroen R. Huyghe, Francis S. Collins, Kerrin S. Small, Cecilia M. Lindgren, Gerton Lunter, Mark I. McCarthy, Joel N. Hirschhorn, Timothy Fennell, Todd Green, Clement Ma, Manny Rivas, Claes Ladenvall, Bo Isomaa, Anne U. Jackson, Hyun Min Kang, Eric Banks, Michael L. Stitzel, Xueling Sim, John R. B. Perry, Bryan Howie, Konstantin Strauch, Goo Jun, Karen L. Mohlke, and Christian Gieger
- Subjects
Genetics ,Multifactorial Inheritance ,Models, Genetic ,Population ,Population genetics ,Biology ,Genetic architecture ,Founder Effect ,White People ,Article ,Diabetes Mellitus, Type 2 ,Evolutionary biology ,Genetic model ,Humans ,Genetics(clinical) ,Computer Simulation ,Exome ,Genotyping ,Genetics (clinical) ,Exome sequencing ,Finland ,Genetic association ,Founder effect - Abstract
Finnish samples have been extensively utilized in studying single-gene disorders, where the founder effect has clearly aided in discovery, and more recently in genome-wide association studies of complex traits, where the founder effect has had less obvious impacts. As the field starts to explore rare variants' contribution to polygenic traits, it is of great importance to characterize and confirm the Finnish founder effect in sequencing data and to assess its implications for rare-variant association studies. Here, we employ forward simulation, guided by empirical deep resequencing data, to model the genetic architecture of quantitative polygenic traits in both the general European and the Finnish populations simultaneously. We demonstrate that power of rare-variant association tests is higher in the Finnish population, especially when variants' phenotypic effects are tightly coupled with fitness effects and therefore reflect a greater contribution of rarer variants. SKAT-O, variable-threshold tests, and single-variant tests are more powerful than other rare-variant methods in the Finnish population across a range of genetic models. We also compare the relative power and efficiency of exome array genotyping to those of high-coverage exome sequencing. At a fixed cost, less expensive genotyping strategies have far greater power than sequencing; in a fixed number of samples, however, genotyping arrays miss a substantial portion of genetic signals detected in sequencing, even in the Finnish founder population. As genetic studies probe sequence variation at greater depth in more diverse populations, our simulation approach provides a framework for evaluating various study designs for gene discovery.
- Published
- 2013
45. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline
- Author
-
Khalid Shakir, Guillermo del Angel, Tadeusz Jordan, David Roazen, Christopher Hartl, David Altshuler, Eric Banks, Mark A. DePristo, Ami Levy-Moonshine, Kiran V. Garimella, Stacey Gabriel, Géraldine A. Van der Auwera, Ryan Poplin, Mauricio O. Carneiro, and Joel Thibault
- Subjects
0301 basic medicine ,FASTQ format ,Biology ,Haploidy ,computer.software_genre ,Biochemistry ,Genome ,Polymorphism, Single Nucleotide ,DNA sequencing ,Article ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,Databases, Genetic ,Humans ,Exome ,Genetics ,Data processing ,Genome, Human ,Genetic Variation ,Molecular Sequence Annotation ,Pipeline (software) ,030104 developmental biology ,Workflow ,Haplotypes ,Calibration ,Data mining ,Raw data ,computer ,Sequence Alignment ,030217 neurology & neurosurgery ,Software - Abstract
This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.
- Published
- 2013
46. Analysis of Rare, Exonic Variation amongst Subjects with Autism Spectrum Disorders and Population Controls
- Author
-
Mark J. Daly, Eric Banks, Elaine T. Lim, Kathryn Roeder, Gerard D. Schellenberg, Richard A. Gibbs, Li Liu, Jason Flannick, Kiran V. Garimella, Corneliu A. Bodea, Alicia Hawes, Bernie Devlin, Irene Newsham, Lora Lewis, Hillary Coon, Edwin H. Cook, Mark A. DePristo, Tim Fennel, Kaitlin E. Samocha, Stephan Ripke, Shannon Gross, Joseph D. Buxbaum, Uma Nagaswamy, Jared Maguire, Eric Boerwinkle, Huyen Dinh, Vladimir Makarov, Stacey Gabriel, Benjamin M. Neale, Khalid Shakir, Jeffrey G. Reid, Christine Stevens, Aniko Sabo, Ryan Poplin, Yuanqing Wu, Donna M. Muzny, and James S. Sutcliffe
- Subjects
Cancer Research ,lcsh:QH426-470 ,Population ,Genome-wide association study ,Computational biology ,Biology ,03 medical and health sciences ,0302 clinical medicine ,Genetic variation ,medicine ,Genetics ,Humans ,Exome ,Genetic Predisposition to Disease ,education ,Child ,Molecular Biology ,Genetics (clinical) ,Ecology, Evolution, Behavior and Systematics ,Genetic Association Studies ,030304 developmental biology ,0303 health sciences ,education.field_of_study ,Statistics ,Genetic Variation ,Human Genetics ,Sequence Analysis, DNA ,medicine.disease ,Missing data ,lcsh:Genetics ,Autism spectrum disorder ,Child Development Disorders, Pervasive ,Meta-analysis ,Case-Control Studies ,Genetics of Disease ,Autism ,Population Control ,030217 neurology & neurosurgery ,Mathematics ,Software ,Research Article ,Genome-Wide Association Study - Abstract
We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD., Author Summary This study evaluates association of rare variants and autism spectrum disorders (ASD) in case and control samples sequenced by two centers. Before doing association analyses, we studied how to combine information across studies. We first harmonized the whole-exome sequence (WES) data, across centers, in terms of the distribution of rare variation. Key features included filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. After filtering, the vast majority of variants calls from seven samples sequenced at both centers matched. We also evaluated whether one should combine summary statistics from data from each center (meta-analysis) or combine data and analyze it together (mega-analysis). For many gene-based tests, we showed that mega-analysis yields more power. After quality control of data from 1,039 ASD cases and 870 controls and a range of analyses, no gene showed exome-wide evidence of significant association. Our results comport with recent results demonstrating that hundreds of genes affect risk for ASD; they suggest that rare risk variants are scattered across these many genes, and thus larger samples will be required to identify those genes.
- Published
- 2013
47. Mapping copy number variation by population-scale genome sequencing
- Author
-
L. McDade, Eric D. Green, Aravinda Chakravarti, Susan Lindsay, Justin Paschall, Aylwyn Scally, Deborah A. Nickerson, Chip Stewart, Stephen T. Sherry, Chunlin Xiao, Alex Reynolds, Carol Scott, H. M. Khouri, Pardis C. Sabeti, Xinmeng Jasmine Mu, Stephen B. Montgomery, Eric Banks, Gabor T. Marth, A. Caprio, Xiaole Zheng, Philip Awadalla, Qunyuan Zhang, Wei Chen, Matthew N. Bainbridge, Donna Muzny, Steven A. McCarroll, Jeffrey M. Kidd, Honglong Wu, Audrey Duncanson, Vladimir Makarov, Lilia M. Iakoucheva, Mark Gerstein, Han-Jun Jin, Can Alkan, Iman Hajirasouliha, T. J. Fennell, C. R. Juenger, J. Kidd, Chris Tyler-Smith, Qasim Ayub, D. Ashworth, Kristian Cibulskis, Yutao Fu, William M. McLaren, Sol Katzman, Yujun Zhang, Rajini R Haraksingh, A. Kebbel, Stuart L. Schreiber, Manual Rivas, Onur Sakarya, Tobias Rausch, Yuan Chen, M. Bachorski, Matthew E. Hurles, N. C. Clemm, Wei Wang, Xiangqun Zheng-Bradley, Adrian M. Sütz, Thomas M. Keane, E. Bank, Stephen F. McLaughlin, Javier Herrero, Jon Keebler, Simon Myers, Aleksandr Morgulis, James Nemesh, Jing Leng, Molly Przeworski, Alon Keinan, Lorraine Toji, Ilya Shlyakhter, Joshua M. Korn, Martine Zilversmit, Luke Jostins, Jun Wang, Jared Maguire, J. M. Korn, Ryan E. Mills, Seungtai Yoon, Bo Wang, F. M. De La Vega, Heng Li, L. Guccione, Laura Clarke, Huisong Zheng, Jeffrey K. Ichikawa, K. Kao, Kirill Rotmistrovsky, L. Gu, David B. Jaffe, David Haussler, Toby Bloom, Tara Skelly, S. Yoon, Gil McVean, Carrie Sougnez, Mark A. Batzer, A. De Witte, Ralf Herwig, Jane Wilkinson, Min Hu, K. Pareja, John V. Pearson, Robert E. Handsaker, Jerilyn A. Walker, Fuli Yu, Anthony A. Philippakis, Aniko Sabo, Jonathan Marchini, Ryan D. Hernandez, Guoqing Li, Peter Donnelly, Eric S. Lander, David J. Dooling, Jun Ding, Lukas Habegger, Pilar N. Ossorio, Andreas Dahl, Wilfried Nietfeld, Miriam F. Moffatt, Alexej Abyzov, Sebastian Zöllner, Ekta Khurana, Jean E. McEwen, Robert S. Fulton, Alexey Soldatov, Fiona Hyland, Philippe Lacroute, Richa Agarwala, Paul Flicek, Weichun Huang, Alison J. Coffey, Tony Cox, John W. Wallis, Robert Sanders, David Neil Cooper, Jason P. Affourtit, Mark A. DePristo, D Wheeler, Christopher Celone, Eugene Kulesha, Craig Elder Mealmaker, B. Desany, Zhengdong D. Zhang, Jonathan M. Manning, Cynthia L. Turcotte, Lisa D Brooks, Xiuqing Zhang, C. Coafra, Rajesh Radhakrishnan, Alan J. Schafer, Jonathan Sebat, Ken Chen, Andrew G. Clark, Alexis Christoforides, Edward V. Ball, Mark S. Guyer, Sharon R. Grossman, Philip Rosenstiel, J. Knowlton, Gonçalo R. Abecasis, Min Jian, James O. Burton, S. Wang, Lucinda Murray, George M. Weinstock, Mark Lathrop, Harold Swerdlow, Michael L. Metzker, Xiaowei Zhan, Yeyang Su, Ruibang Luo, Charles Lee, Huanming Yang, P. Marquardt, Charles N. Rotimi, Lynne V. Nazareth, Michael Snyder, Faheem Niazi, Quan Long, Jane Kaye, Michael Strömberg, Adam Auton, Michael Bauer, Cheng-Sheng Lee, S. Gabriel, Jim Stalker, Heather E. Peckham, D. Conners, Raffaella Smith, Yingrui Li, Niall Anthony Gormley, Megan Hanna, Jinchuan Xing, Hugo Y. K. Lam, S. Giles, Evan E. Eichler, Justin Jee, Loukas Moutsianas, Jiang Du, Hyun Min Kang, Eric F. Tsung, Ni Huang, Kai Ye, Stephen F. Schaffner, Suleyman Cenk Sahinalp, Xinghua Shi, Sean Humphray, Ahmet Kurdoglu, Amy L. McGuire, Sandra J. Lee, Linnea Fulton, Francis S. Collins, Huiqing Liang, S. C. Melton, A. Nawrocki, Aaron R. Quinlan, Tatjana Borodina, Lynn B. Jorde, Leopold Parts, Michael D. McLellan, Adrian M. Stütz, Paul Scheet, Amit Indap, Vyacheslav Amstislavskiy, Waibhav Tembe, S. Attiya, Jin Yu, Dmitri Parkhomchuk, Si Quang Le, Fabian Grubert, E. Buglione, Ruiqiang Li, Yan Zhou, Fiona Cunningham, Gilean McVean, Wan-Ping Lee, W. Song, Richard Durbin, Andrew Kernytsky, Stephen M. Beckstrom-Sternberg, Xin Ma, J. Jeng, Lauren Ambrogio, Carol Churcher, Ryan Poplin, William O.C.M. Cookson, Rasko Leinonen, Alexey N. Davydov, Kenny Ye, Paige Anderson, Alexander E. Urban, Adam Felsenfeld, Jeffrey S. Reid, Cornelis A. Albers, Jan O. Korbel, Senduran Balasubramaniam, Elaine R. Mardis, Gozde Aksay, Peter H. Sudmant, Aaron McKenna, M. Labrecque, Amanda J. Price, Vadim Zalunin, Donald F. Conrad, Florian Mertes, Christie Kovar, Danny Challis, A. D. Ball, Petr Danecek, Kiran V. Garimella, Bryan Howie, Scott Kahn, Shuaishuai Tai, E. P. Garrison, Robert D. Bjornson, Shankar Balasubramanian, Fereydoun Hormozdiari, Geng Tian, S. Clark, Joanna L. Kelley, Asif T. Chinwalla, Ramenani Ravi K, Ralf Sudbrak, Mark Kaganovich, Jeffrey C. Barrett, David Rio Deiros, Jeremiah D. Degenhardt, A. Palotie, Alistair Ward, Gianna Costa, Huyen Dinh, M. Minderman, R. Keira Cheetham, Jingxiang Li, Michael A. Quail, P. Koko-Gonzales, Alastair Kent, Martin Shumway, David R. Bentley, Ferran Casals, Leena Peltonen, Klaudia Walter, Christopher Hartl, Erica Shefler, Zhaolei Zhang, Hans Lehrach, Jessica L. Peterson, Roger Winer, Daniel C. Koboldt, D. Riches, Terena James, Wen Fung Leong, Michael Egholm, Thomas W. Blackwell, Peter D. Stenson, Anthony J. Cox, Andrew D. Kern, David M. Carter, M. Tolzmann, Daniel G. MacArthur, Jiantao Wu, Jennifer Stone, Angie S. Hinrichs, M. Albrecht, Jo Knight, Chang-Yun Lin, Adam R. Boyko, Dan Turner, Xiaodong Fang, Youssef Idaghdour, Liming Liang, Ryan N. Gutenkunst, David Craig, Mark J. Daly, Xiaosen Guo, Neda Gharani, Gerton Lunter, Shuli Kang, A. Burke, Shripad Sinari, Yongming A. Sun, Zoya Kingsbury, Robert M. Kuhn, Miriam K. Konkel, T. Li, Kevin McKernan, Simon Gravel, Brian L. Browning, C Sidore, Zamin Iqbal, Matthew Mort, Afidalina Tumian, Michael C. Wendl, Adam Phillips, Bernd Timmermann, Carlos Bustamante, H. Y. Lam, Deniz Kural, Richard A. Gibbs, Bartha Maria Knoppers, Emmanouil T. Dermitzakis, Lon Phan, Richard K. Wilson, D. L. Altshuler, S. Keenen, Assya Abdallah, Eric A. Stone, Michael A. Eberle, Li Ding, and Broad Institute of MIT and Harvard
- Subjects
DNA Copy Number Variations ,Genotype ,Population ,Genomic Structural Variation ,Genomics ,Computational biology ,Biology ,Genome ,Article ,DNA sequencing ,structural variation segmental duplications short-read rearrangements disorders disease common schizophrenia polymorphism insertions ,03 medical and health sciences ,0302 clinical medicine ,Gene Duplication ,Insertional ,Genetics ,Humans ,Genetic Predisposition to Disease ,Copy-number variation ,1000 Genomes Project ,education ,Sequence Deletion ,030304 developmental biology ,0303 health sciences ,education.field_of_study ,Multidisciplinary ,Genome, Human ,Reproducibility of Results ,Sequence Analysis, DNA ,DNA ,Mutagenesis, Insertional ,Genetics, Population ,Mutagenesis ,Human genome ,Sequence Analysis ,030217 neurology & neurosurgery ,Human - Abstract
Summary Genomic structural variants (SVs) are abundant in humans, differing from other variation classes in extent, origin, and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (i.e., copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analyzing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
- Published
- 2011
48. The $1,000 Genome: The Revolution in DNA Sequencing and the New Era of Personalized Medicine
- Author
-
Mark A. DePristo
- Subjects
Genetics ,business.industry ,$1,000 genome ,Genomics ,Genome project ,Biology ,Data science ,Genome ,DNA sequencing ,Book Review ,Human genome ,Genetics(clinical) ,Personalized medicine ,business ,Genetics (clinical) ,Personal genomics - Abstract
Kevin Davies, the (co-)author of Cracking the Genome and Breakthrough, brings his popularization of modern genetics to the newly emerging field of personal genomics and next-generation DNA sequencing (NGS) in his most recent book, The $1000 Genome: The Revolution in DNA Sequencing and the New Era of Personalized Medicine. Davies's newest work recounts the post-Human Genome Project (HGP) quest for the ever less expensive human genome sequence and its reciprocal pursuit of the meaning of variation among human genomes for traits, particularly common disease. Weaving these two disparate threads together throughout the book, Davies provides a chronologically anchored story of the development of next-generation DNA sequencing technology, from Life/454 and Illumina to Pacific Biosciences, personalized genomics from Watson and Venter to the Public Genome Project, and whole-genome association studies linking SNPs to common human diseases.Although the broad range of topics ensures that each chapter introduces an interesting big idea, technology, or social issue, the narrative flow can be jarring as Davies jumps around among NGS technologies, personal genomics companies, and medical genetics research. The book relies perhaps too heavily on colorful stories of the companies, projects, and people pursuing the $1,000 genome, especially given the limited discussions and analyses of the science and technology itself. Nevertheless, The $1,000 Genome does a good job presenting the complex and interconnected technologies and science underlying the genomics revolution as vignettes accessible to an informed but nonexpert audience. After a childhood of reading popular science books about physics, The $1,000 Genome delivers exactly what I have come to expect of such books: an enjoyable, easy read that makes one wish they were part of the exciting story itself, which for me is a fact that I'm thankful for even more after reading Davies's book.
- Published
- 2010
- Full Text
- View/download PDF
49. Next-generation sequencing for HLA typing of class I loci
- Author
-
Niall J. Lennon, Scott Anderson, Namrata Gupta, Matthew R. Henn, Xiaojiang Gao, Eric Banks, Rachel L. Erlich, Mark A. DePristo, Mary Carrington, Paul I.W. de Bakker, and Xiaoming Jia
- Subjects
Genetics ,lcsh:QH426-470 ,lcsh:Biotechnology ,Histocompatibility Testing ,Methodology Article ,Genes, MHC Class I ,Locus (genetics) ,Human leukocyte antigen ,Sequence Analysis, DNA ,Biology ,Polymerase Chain Reaction ,DNA sequencing ,law.invention ,lcsh:Genetics ,law ,lcsh:TP248.13-248.65 ,Humans ,Allele ,DNA microarray ,International HapMap Project ,Polymerase chain reaction ,Genetic association ,Biotechnology - Abstract
Background Comprehensive sequence characterization across the MHC is important for successful organ transplantation and genetic association studies. To this end, we have developed an automated sample preparation, molecular barcoding and multiplexing protocol for the amplification and sequence-determination of class I HLA loci. We have coupled this process to a novel HLA calling algorithm to determine the most likely pair of alleles at each locus. Results We have benchmarked our protocol with 270 HapMap individuals from four worldwide populations with 96.4% accuracy at 4-digit resolution. A variation of this initial protocol, more suitable for large sample sizes, in which molecular barcodes are added during PCR rather than library construction, was tested on 95 HapMap individuals with 98.6% accuracy at 4-digit resolution. Conclusions Next-generation sequencing on the 454 FLX Titanium platform is a reliable, efficient, and scalable technology for HLA typing.
- Published
- 2010
50. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data
- Author
-
Matthew Hanna, Andrew Kernytsky, David Altshuler, Andrey Sivachenko, Eric Banks, Aaron McKenna, Stacey Gabriel, Kristian Cibulskis, Kiran V. Garimella, Mark A. DePristo, Mark J. Daly, Massachusetts Institute of Technology. Department of Biology, and Altshuler, David
- Subjects
Resource ,Functional programming ,Variant Call Format ,Correctness ,Genome ,Base Sequence ,business.industry ,Data management ,Genomics ,Sequence Analysis, DNA ,Structured programming ,Biology ,computer.software_genre ,Bioinformatics ,Data science ,Software framework ,Data access ,Genetics ,1000 Genomes Project ,business ,computer ,Genetics (clinical) ,Software - Abstract
Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas., National Human Genome Research Institute (U.S.) (Large Scale Sequencing and Analysis of Genomes grant (54 HG003067)), National Human Genome Research Institute (U.S.) (Joint SNP and CNV calling in 1000 Genomes sequence data grant (U01 HG005208))
- Published
- 2010
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.