1. Comprehensive variation discovery in single human genomes
- Author
-
Iain MacCallum, Ryan Hegarty, Brian Sogoloff, Laurie Holmes, David B. Jaffe, Eric S. Lander, Shuangye Yin, Bayo Lau, Chad Nusbaum, Ted Sharpe, Diana Tabbaa, Carsten Russ, Neil I. Weisenfeld, Louise Williams, Massachusetts Institute of Technology. Department of Biology, and Lander, Eric S.
- Subjects
Molecular Sequence Data ,Genomics ,Computational biology ,Biology ,Polymerase Chain Reaction ,Polymorphism, Single Nucleotide ,Sensitivity and Specificity ,Genome ,Article ,03 medical and health sciences ,0302 clinical medicine ,Gene Frequency ,Genetic variation ,Genetics ,Humans ,Allele frequency ,Oligonucleotide Array Sequence Analysis ,030304 developmental biology ,Segmental duplication ,Sequence (medicine) ,0303 health sciences ,Base Sequence ,Genome, Human ,Chromosome Mapping ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Reproducibility of Results ,3. Good health ,Fosmid ,Human genome ,Algorithms ,Software ,030217 neurology & neurosurgery - Abstract
Complete knowledge of the genetic variation in individual human genomes is a crucial foundation for understanding the etiology of disease. Genetic variation is typically characterized by sequencing individual genomes and comparing reads to a reference. Existing methods do an excellent job of detecting variants in approximately 90% of the human genome; however, calling variants in the remaining 10% of the genome (largely low-complexity sequence and segmental duplications) is challenging. To improve variant calling, we developed a new algorithm, DISCOVAR, and examined its performance on improved, low-cost sequence data. Using a newly created reference set of variants from the finished sequence of 103 randomly chosen fosmids, we find that some standard variant call sets miss up to 25% of variants. We show that the combination of new methods and improved data increases sensitivity by several fold, with the greatest impact in challenging regions of the human genome., National Human Genome Research Institute (U.S.) (Grant R01HG003474), National Human Genome Research Institute (U.S.) (Grant U54HG003067), National Institute of Allergy and Infectious Diseases (U.S.) (Contract HHSN272200900018C)
- Published
- 2014