71 results on '"Aaron L. Halpern"'
Search Results
2. TreeLign: simultaneous stepwise alignment and phylogenetic positioning, with its application to automatic phylogenetic assignment of 16S rRNAs.
- Author
-
Yuan Li, Aaron L. Halpern, and Shaojie Zhang
- Published
- 2011
- Full Text
- View/download PDF
3. A General Paradigm for Fast, Adaptive Clustering of Biological Sequences.
- Author
-
Knut Reinert, Markus Bauer 0001, Andreas Döring 0001, Gunnar W. Klau, and Aaron L. Halpern
- Published
- 2007
4. Syntenic Layout of Two Assemblies of Related Genomes.
- Author
-
Olaf Delgado-Friedrichs, Aaron L. Halpern, Ross Lippert, Christian Rausch, Stephan C. Schuster, and Daniel H. Huson
- Published
- 2004
5. Efficiently detecting polymorphisms during the fragment assembly process.
- Author
-
Daniel P. Fasulo, Aaron L. Halpern, Ian M. Dew, and Clark M. Mobarry
- Published
- 2002
6. Segment Match Refinement and Applications.
- Author
-
Aaron L. Halpern, Daniel H. Huson, and Knut Reinert
- Published
- 2002
- Full Text
- View/download PDF
7. Design of a compartmentalized shotgun assembler for the human genome.
- Author
-
Daniel H. Huson, Knut Reinert, Saul A. Kravitz, Karin A. Remington, Arthur L. Delcher, Ian M. Dew, Michael Flanigan, Aaron L. Halpern, Zhongwu Lai, Clark M. Mobarry, Granger G. Sutton, and Eugene W. Myers
- Published
- 2001
8. Comparing Assemblies Using Fragments and Mate-Pairs.
- Author
-
Daniel H. Huson, Aaron L. Halpern, Zhongwu Lai, Eugene W. Myers, Knut Reinert, and Granger G. Sutton
- Published
- 2001
- Full Text
- View/download PDF
9. Visualization challenges for a new cyberpharmaceutical computing paradigm.
- Author
-
Russell J. Turner, Kabir Chaturvedi, Nathan Edwards, Daniel P. Fasulo, Aaron L. Halpern, Daniel H. Huson, Oliver Kohlbacher, Jason R. Miller, Knut Reinert, Karin A. Remington, Russell Schwartz, Brian Walenz, Shibu Yooseph, and Sorin Istrail
- Published
- 2001
- Full Text
- View/download PDF
10. Computational Techniques for Human Genome Resequencing Using Mated Gapped Reads.
- Author
-
Paolo Carnevali, Jonathan Baccash, Aaron L. Halpern, Igor Nazarenko, Geoffrey B. Nilsen, Krishna P. Pant, Jessica C. Ebert, Anushka Brownley, Matt Morenzoni, Vitali Karpinchyk, Bruce Martin, Dennis G. Ballinger, and Radoje Drmanac
- Published
- 2012
- Full Text
- View/download PDF
11. Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data
- Author
-
Zoya Kingsbury, Andrew J. Connell, Ryan J. Taft, Alba Sanchis-Juan, Courtney E. French, Matthew E.R. Butchbach, David R. Bentley, Aditi Chawla, Xiao Chen, Isabelle Delon, Michael A. Eberle, Aaron L. Halpern, F Lucy Raymond, and Nihr BioResource
- Subjects
0301 basic medicine ,spinal muscular atrophy (SMA) ,medicine.medical_specialty ,Copy number analysis ,Genomics ,carrier screening ,SMN1 ,Computational biology ,030105 genetics & heredity ,Biology ,Genome ,DNA sequencing ,Article ,Muscular Atrophy, Spinal ,03 medical and health sciences ,0302 clinical medicine ,medicine ,Humans ,Child ,Gene ,Genetics (clinical) ,030304 developmental biology ,0303 health sciences ,Base Sequence ,genome sequencing (GS) ,Spinal muscular atrophy ,bioinformatics ,SMA ,medicine.disease ,Survival of Motor Neuron 1 Protein ,nervous system diseases ,030104 developmental biology ,copy-number analysis ,Child, Preschool ,Medical genetics ,030217 neurology & neurosurgery ,Reference genome - Abstract
PurposeSpinal muscular atrophy (SMA), caused by loss of the SMN1 gene, is a leading cause of early childhood death. Due to the near identical sequences of SMN1 and SMN2, analysis of this region is challenging. Population-wide SMA screening to quantify the SMN1 copy number (CN) is recommended by the American College of Medical Genetics.MethodsWe developed a method that accurately identifies the CN of SMN1 and SMN2 using genome sequencing (GS) data by analyzing read depth and eight informative reference genome differences between SMN1/2.ResultsWe characterized SMN1/2 in 12,747 genomes, identified 1568 samples with SMN1 gains or losses and 6615 samples with SMN2 gains or losses and calculated a pan-ethnic carrier frequency of 2%, consistent with previous studies. Additionally, 99.8% of our SMN1 and 99.7% of SMN2 CN calls agreed with orthogonal methods, with a recall of 100% for SMA and 97.8% for carriers, and a precision of 100% for both SMA and carriers.ConclusionThis SMN copy number caller can be used to identify both carrier and affected status of SMA, enabling SMA testing to be offered as a comprehensive test in neonatal care and an accurate carrier screening tool in GS sequencing projects.
- Published
- 2020
- Full Text
- View/download PDF
12. Consensus generation and variant detection by Celera Assembler.
- Author
-
Gennady Denisov, Brian Walenz, Aaron L. Halpern, Jason R. Miller, Nelson Axelrod, Samuel Levy, and Granger G. Sutton
- Published
- 2008
- Full Text
- View/download PDF
13. Nanoliter reactors improve multiple displacement amplification of genomes from single cells.
- Author
-
Yann Marcy, Thomas Ishoey, Roger S Lasken, Timothy B Stockwell, Brian P Walenz, Aaron L Halpern, Karen Y Beeson, Susanne M D Goldberg, and Stephen R Quake
- Subjects
Genetics ,QH426-470 - Abstract
Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA) method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-microl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells.
- Published
- 2007
- Full Text
- View/download PDF
14. The Sorcerer II Global Ocean Sampling Expedition: metagenomic characterization of viruses within aquatic microbial samples.
- Author
-
Shannon J Williamson, Douglas B Rusch, Shibu Yooseph, Aaron L Halpern, Karla B Heidelberg, John I Glass, Cynthia Andrews-Pfannkoch, Douglas Fadrosh, Christopher S Miller, Granger Sutton, Marvin Frazier, and J Craig Venter
- Subjects
Medicine ,Science - Abstract
Viruses are the most abundant biological entities on our planet. Interactions between viruses and their hosts impact several important biological processes in the world's oceans such as horizontal gene transfer, microbial diversity and biogeochemical cycling. Interrogation of microbial metagenomic sequence data collected as part of the Sorcerer II Global Ocean Expedition (GOS) revealed a high abundance of viral sequences, representing approximately 3% of the total predicted proteins. Cluster analyses of the viral sequences revealed hundreds to thousands of viral genes encoding various metabolic and cellular functions. Quantitative analyses of viral genes of host origin performed on the viral fraction of aquatic samples confirmed the viral nature of these sequences and suggested that significant portions of aquatic viral communities behave as reservoirs of such genetic material. Distributional and phylogenetic analyses of these host-derived viral sequences also suggested that viral acquisition of environmentally relevant genes of host origin is a more abundant and widespread phenomenon than previously appreciated. The predominant viral sequences identified within microbial fractions originated from tailed bacteriophages and exhibited varying global distributions according to viral family. Recruitment of GOS viral sequence fragments against 27 complete aquatic viral genomes revealed that only one reference bacteriophage genome was highly abundant and was closely related, but not identical, to the cyanomyovirus P-SSM4. The co-distribution across all sampling sites of P-SSM4-like sequences with the dominant ecotype of its host, Prochlorococcus supports the classification of the viral sequences as P-SSM4-like and suggests that this virus may influence the abundance, distribution and diversity of one of the most dominant components of picophytoplankton in oligotrophic oceans. In summary, the abundance and broad geographical distribution of viral sequences within microbial fractions, the prevalence of genes among viral sequences that encode microbial physiological function and their distinct phylogenetic distribution lend strong support to the notion that viral-mediated gene acquisition is a common and ongoing mechanism for generating microbial diversity in the marine environment.
- Published
- 2008
- Full Text
- View/download PDF
15. The diploid genome sequence of an individual human.
- Author
-
Samuel Levy, Granger Sutton, Pauline C Ng, Lars Feuk, Aaron L Halpern, Brian P Walenz, Nelson Axelrod, Jiaqi Huang, Ewen F Kirkness, Gennady Denisov, Yuan Lin, Jeffrey R MacDonald, Andy Wing Chun Pang, Mary Shago, Timothy B Stockwell, Alexia Tsiamouri, Vineet Bafna, Vikas Bansal, Saul A Kravitz, Dana A Busam, Karen Y Beeson, Tina C McIntosh, Karin A Remington, Josep F Abril, John Gill, Jon Borman, Yu-Hui Rogers, Marvin E Frazier, Stephen W Scherer, Robert L Strausberg, and J Craig Venter
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.
- Published
- 2007
- Full Text
- View/download PDF
16. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii) genome.
- Author
-
Byrappa Venkatesh, Ewen F Kirkness, Yong-Hwee Loh, Aaron L Halpern, Alison P Lee, Justin Johnson, Nidhi Dandona, Lakshmi D Viswanathan, Alice Tay, J Craig Venter, Robert L Strausberg, and Sydney Brenner
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras) provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4x coverage) and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element-like and long interspersed element-like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.
- Published
- 2007
- Full Text
- View/download PDF
17. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.
- Author
-
Shibu Yooseph, Granger Sutton, Douglas B Rusch, Aaron L Halpern, Shannon J Williamson, Karin Remington, Jonathan A Eisen, Karla B Heidelberg, Gerard Manning, Weizhong Li, Lukasz Jaroszewski, Piotr Cieplak, Christopher S Miller, Huiying Li, Susan T Mashiyama, Marcin P Joachimiak, Christopher van Belle, John-Marc Chandonia, David A Soergel, Yufeng Zhai, Kannan Natarajan, Shaun Lee, Benjamin J Raphael, Vineet Bafna, Robert Friedman, Steven E Brenner, Adam Godzik, David Eisenberg, Jack E Dixon, Susan S Taylor, Robert L Strausberg, Marvin Frazier, and J Craig Venter
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.
- Published
- 2007
- Full Text
- View/download PDF
18. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.
- Author
-
Douglas B Rusch, Aaron L Halpern, Granger Sutton, Karla B Heidelberg, Shannon Williamson, Shibu Yooseph, Dongying Wu, Jonathan A Eisen, Jeff M Hoffman, Karin Remington, Karen Beeson, Bao Tran, Hamilton Smith, Holly Baden-Tillson, Clare Stewart, Joyce Thorpe, Jason Freeman, Cynthia Andrews-Pfannkoch, Joseph E Venter, Kelvin Li, Saul Kravitz, John F Heidelberg, Terry Utterback, Yu-Hui Rogers, Luisa I Falcón, Valeria Souza, Germán Bonilla-Rosso, Luis E Eguiarte, David M Karl, Shubha Sathyendranath, Trevor Platt, Eldredge Bermingham, Victor Gallardo, Giselle Tamayo-Castillo, Michael R Ferrari, Robert L Strausberg, Kenneth Nealson, Robert Friedman, Marvin Frazier, and J Craig Venter
- Subjects
Biology (General) ,QH301-705.5 - Abstract
The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed "fragment recruitment," addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed "extreme assembly," made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.
- Published
- 2007
- Full Text
- View/download PDF
19. Clitics
- Author
-
Aaron L. Halpern
- Published
- 2017
- Full Text
- View/download PDF
20. Strelka2: fast and accurate calling of germline and somatic variants
- Author
-
Peter Krusche, Doruk Beyter, Christopher T. Saunders, Konrad Scheffler, Aaron L. Halpern, Xiaoyu Chen, Morten Källberg, Sangtae Kim, Mitchell A. Bekritsky, Eunho Noh, and Yeonbin Kim
- Subjects
0301 basic medicine ,Computer science ,Somatic cell ,Computational biology ,Biochemistry ,Germline ,03 medical and health sciences ,Germline mutation ,INDEL Mutation ,Neoplasms ,Databases, Genetic ,Humans ,Molecular Biology ,Germ-Line Mutation ,Liquid Tumor ,Models, Genetic ,Whole Genome Sequencing ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Cell Biology ,030104 developmental biology ,Haplotypes ,Genome informatics ,Sample contamination ,Software ,Biotechnology - Abstract
We describe Strelka2 ( https://github.com/Illumina/strelka ), an open-source small-variant-calling method for research and clinical germline and somatic sequencing applications. Strelka2 introduces a novel mixture-model-based estimation of insertion/deletion error parameters from each sample, an efficient tiered haplotype-modeling strategy, and a normal sample contamination model to improve liquid tumor analysis. For both germline and somatic calling, Strelka2 substantially outperformed the current leading tools in terms of both variant-calling accuracy and computing cost.
- Published
- 2017
21. Strelka2: Fast and accurate variant calling for clinical sequencing applications
- Author
-
Eunho Noh, Peter Krusche, Konrad Scheffler, Christopher T. Saunders, Doruk Beyter, Sangtae Kim, Aaron L. Halpern, Morten Källberg, Mitchell A. Bekritsky, and Xiaoyu Chen
- Subjects
Genetics ,Liquid Tumor ,InformationSystems_GENERAL ,ComputingMethodologies_PATTERNRECOGNITION ,GeneralLiterature_INTRODUCTORYANDSURVEY ,ComputingMilieux_COMPUTERSANDEDUCATION ,Computational biology ,Biology ,Indel ,Sample contamination ,Germline - Abstract
We describe Strelka2 (https://github.com/Illumina/strelka), an open-source small variant calling method for clinical germline and somatic sequencing applications. Strelka2 introduces a novel mixture-model based estimation of indel error parameters from each sample, an efficient tiered haplotype modeling strategy and a normal sample contamination model to improve liquid tumor analysis. For both germline and somatic calling, Strelka2 substantially outperforms current leading tools on both variant calling accuracy and compute cost.
- Published
- 2017
- Full Text
- View/download PDF
22. Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube
- Author
-
Niall Anthony Gormley, Maria C Rogert, Jerushah Thomas, Lena Christiansen, Ana Granat, Ros Jackson, Frank J. Steemers, Jay Shendure, Yannan Zhao, Mostafa Ronaghi, Aaron L. Halpern, Melissa M. Wiley, Dmitry K. Pokholok, Steven Norberg, Fan Zhang, Emily Welch, Erich Jaeger, Kevin L. Gunderson, and Natalie Morrell
- Subjects
0301 basic medicine ,Population ,Biomedical Engineering ,Bioengineering ,Hybrid genome assembly ,Computational biology ,Biology ,Barcode ,Applied Microbiology and Biotechnology ,Genome ,Deep sequencing ,DNA sequencing ,law.invention ,03 medical and health sciences ,law ,DNA Barcoding, Taxonomic ,Humans ,education ,Genetics ,education.field_of_study ,Massive parallel sequencing ,Genome, Human ,High-Throughput Nucleotide Sequencing ,Genomics ,030104 developmental biology ,Haplotypes ,Molecular Medicine ,Human genome ,Biotechnology - Abstract
Haplotype-resolved genome sequencing promises to unlock a wealth of information in population and medical genetics. However, for the vast majority of genomes sequenced to date, haplotypes have not been determined because of cumbersome haplotyping workflows that require fractions of the genome to be sequenced in a large number of compartments. Here we demonstrate barcode partitioning of long DNA molecules in a single compartment using "on-bead" barcoded tagmentation. The key to the method that we call "contiguity preserving transposition" sequencing on beads (CPTv2-seq) is transposon-mediated transfer of homogenous populations of barcodes from beads to individual long DNA molecules that get fragmented at the same time (tagmentation). These are then processed to sequencing libraries wherein all sequencing reads originating from each long DNA molecule share a common barcode. Single-tube, bulk processing of long DNA molecules with ∼150,000 different barcoded bead types provides a barcode-linked read structure that reveals long-range molecular contiguity. This technology provides a simple, rapid, plate-scalable and automatable route to accurate, haplotype-resolved sequencing, and phasing of structural variants of the genome.
- Published
- 2016
23. A reference dataset of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree
- Author
-
Benjamin L. Moore, Gil McVean, Elliott H. Margulies, Zamin Iqbal, Epameinondas Fritzilas, Michael A. Eberle, Han-Yu Chuang, Aaron L. Halpern, Sean Humphray, David R. Bentley, Peter Krusche, Morten Källberg, Semyon Kruglyak, and Mitchell A. Bekritsky
- Subjects
0301 basic medicine ,Resource ,Genetic inheritance ,Genotype ,Sequence analysis ,Genomics ,Computational biology ,Biology ,Polymorphism, Single Nucleotide ,Genome ,03 medical and health sciences ,0302 clinical medicine ,Data sequences ,INDEL Mutation ,Databases, Genetic ,Genetics ,Humans ,Exome ,Indel ,Genetics (clinical) ,Genome, Human ,Haplotype ,Inheritance (genetic algorithm) ,High-Throughput Nucleotide Sequencing ,Sequence Analysis, DNA ,Human genetics ,Pedigree ,030104 developmental biology ,030217 neurology & neurosurgery ,Algorithms ,Software ,Reference dataset - Abstract
Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalogue of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of seventeen individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased “platinum” variant catalogue of 4.7 million single nucleotide variants (SNVs) plus 0.7 million small (1-50bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and eleven children of this pedigree. Platinum genotypes are highly concordant with the current catalogue of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%), and add a validated truth catalogue that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission (“non-platinum”) revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.
- Published
- 2016
- Full Text
- View/download PDF
24. Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage
- Author
-
Robert Friedman, Jeremy D. Selengut, R. Alexander Richter, Daniel H. Haft, Aaron L. Halpern, Roger S. Lasken, J. Craig Venter, Mary-Jane Lombardo, Mark Novotny, Douglas B. Rusch, Christopher L. Dupont, Shibu Yooseph, Ruben E. Valas, Joyclyn Yee-Greenbaum, and Kenneth H. Nealson
- Subjects
Rhodopsin ,proteorhodopsin ,tonB receptors ,Oceans and Seas ,Lineage (evolution) ,SAR86 ,Biology ,Microbiology ,Genome ,Phylogenetics ,RNA, Ribosomal, 16S ,Rhodopsins, Microbial ,Seawater ,Genomic library ,single cell genomics ,Phylogeny ,Ecology, Evolution, Behavior and Systematics ,Genomic Library ,Proteorhodopsin ,Ecology ,SAR11 ,Computational Biology ,Ribosomal RNA ,Plankton ,Evolutionary biology ,Metagenomics ,biology.protein ,Original Article ,metagenomic assembly ,Bacterial outer membrane ,Gammaproteobacteria ,Genome, Bacterial - Abstract
Bacteria in the 16S rRNA clade SAR86 are among the most abundant uncultivated constituents of microbial assemblages in the surface ocean for which little genomic information is currently available. Bioinformatic techniques were used to assemble two nearly complete genomes from marine metagenomes and single-cell sequencing provided two more partial genomes. Recruitment of metagenomic data shows that these SAR86 genomes substantially increase our knowledge of non-photosynthetic bacteria in the surface ocean. Phylogenomic analyses establish SAR86 as a basal and divergent lineage of γ-proteobacteria, and the individual genomes display a temperature-dependent distribution. Modestly sized at 1.25-1.7 Mbp, the SAR86 genomes lack several pathways for amino-acid and vitamin synthesis as well as sulfate reduction, trends commonly observed in other abundant marine microbes. SAR86 appears to be an aerobic chemoheterotroph with the potential for proteorhodopsin-based ATP generation, though the apparent lack of a retinal biosynthesis pathway may require it to scavenge exogenously-derived pigments to utilize proteorhodopsin. The genomes contain an expanded capacity for the degradation of lipids and carbohydrates acquired using a wealth of tonB-dependent outer membrane receptors. Like the abundant planktonic marine bacterial clade SAR11, SAR86 exhibits metabolic streamlining, but also a distinct carbon compound specialization, possibly avoiding competition.
- Published
- 2011
- Full Text
- View/download PDF
25. Characterization of Prochlorococcus clades from iron-depleted oceanic regions
- Author
-
J. Craig Venter, Adam C. Martiny, Douglas B. Rusch, Christopher L. Dupont, and Aaron L. Halpern
- Subjects
Multidisciplinary ,Ecotype ,Ecology ,Iron ,Oceans and Seas ,fungi ,Iron fertilization ,Biodiversity ,Biological Sciences ,Biology ,biology.organism_classification ,Phylogenetics ,Phytoplankton ,Upwelling ,Marine ecosystem ,Prochlorococcus ,Genome, Bacterial ,Phylogeny - Abstract
Prochlorococcus describes a diverse and abundant genus of marine photosynthetic microbes. It is primarily found in oligotrophic waters across the globe and plays a crucial role in energy and nutrient cycling in the ocean ecosystem. The abundance, global distribution, and availability of isolates make Prochlorococcus a model system for understanding marine microbial diversity and biogeochemical cycling. Analysis of 73 metagenomic samples from the Global Ocean Sampling expedition acquired in the Atlantic, Pacific, and Indian Oceans revealed the presence of two uncharacterized Prochlorococcus clades. A phylogenetic analysis using six different genetic markers places the clades close to known lineages adapted to high-light environments. The two uncharacterized clades consistently cooccur and dominate the surface waters of high-temperature, macronutrient-replete, and low-iron regions of the Eastern Equatorial Pacific upwelling and the tropical Indian Ocean. They are genetically distinct from each other and other high-light Prochlorococcus isolates and likely define a previously unrecognized ecotype. Our detailed genomic analysis indicates that these clades comprise organisms that are adapted to iron-depleted environments by reducing their iron quota through the loss of several iron-containing proteins that likely function as electron sinks in the photosynthetic pathway in other Prochlorococcus clades from high-light environments. The presence and inferred physiology of these clades may explain why Prochlorococcus populations from iron-depleted regions do not respond to iron fertilization experiments and further expand our understanding of how phytoplankton adapt to variations in nutrient availability in the ocean.
- Published
- 2010
- Full Text
- View/download PDF
26. Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays
- Author
-
Robert Hartlage, Brock A. Peters, Igor Nazarenko, Jonathan Baccash, Calvin Kong, Vitali Karpinchyk, Andres Fernandez, Abraham M. Rosenbaum, Ryan J. Cedeno, Paolo Carnevali, Celeste E. McBride, Norman L. Burns, Shaunak Roy, Karen W. Shannon, George M. Church, Snezana Drmanac, Daniel F. Chernikoff, Radoje Drmanac, Geoffrey B. Nilsen, Claudia Richter, Coleen R. Hacker, Jay Shafto, William C. Banyai, Kaliprasad Pothuraju, Helena Perazich, Bruce L. Martin, Dennis G. Ballinger, Benjamin Curson, Linsu Chen, Brian Hauser, Steve Huang, Alexander Wait Zaranek, Anushka Brownley, Dylan Vu, Matt Morenzoni, Andrew B. Sparks, Matthew J. Callow, Alex Cheung, Clifford Reid, Adam P. Borcherding, George Yeung, Xiaodi Wu, Catherine Le, Tom Landers, Aaron L. Halpern, Bahram G. Kermani, Kimberly Perry, Arnold R. Oliphant, Mark Koenig, Charit L. Pethiyagoda, Michel Sun, Joseph V. Thakuria, Conrad G. Sheppy, Anne Tran, Robert E. Morey, Fredrik A. Dahl, Krishna Pant, Karl Mutch, Bryan Staker, Joe Peterson, Jessica Ebert, Yuan Jiang, Jia Liu, Razvan Chirita, and Uladzislau Sharanhovich
- Subjects
Male ,Genotype ,Sequence analysis ,Biology ,Polymorphism, Single Nucleotide ,Genome ,DNA sequencing ,chemistry.chemical_compound ,Sequencing by hybridization ,Human Genome Project ,Humans ,Nanotechnology ,Genomic library ,Genetics ,Genomic Library ,Multidisciplinary ,Base Sequence ,Genome, Human ,Computational Biology ,DNA ,Sequence Analysis, DNA ,Nucleic acid amplification technique ,Microarray Analysis ,Nanostructures ,Haplotypes ,chemistry ,Costs and Cost Analysis ,Human genome ,Databases, Nucleic Acid ,Nucleic Acid Amplification Techniques ,Software - Abstract
Toward $1000 Genomes The ability to generate human genome sequence data that is complete, accurate, and inexpensive is a necessary prerequisite to perform genome-wide disease association studies. Drmanac et al. (p. 78 , published online 5 November) present a technique advancing toward this goal. The method uses Type IIS endonucleases to incorporate short oligonucleotides within a set of randomly sheared circularized DNA. DNA polymerase then generates concatenated copies of the circular oligonucleotides leading to formation of compact but very long oligonucleotides which are then sequenced by ligation. The relatively low cost of this technology, which shows a low error rate, advances sequencing closer to the goal of the $1000 genome.
- Published
- 2010
- Full Text
- View/download PDF
27. It's all relative: ranking the diversity of aquatic bacterial communities
- Author
-
Karen Beeson, J. Craig Venter, Jennifer B. H. Martiny, Allison K. Shaw, Bao Tran, and Aaron L. Halpern
- Subjects
DNA, Bacterial ,Gamma diversity ,Statistics as Topic ,Biology ,Microbiology ,Diversity index ,RNA, Ribosomal, 16S ,Ecology, Evolution, Behavior and Systematics ,Gene Library ,Bacteria ,Ecology ,Genes, rRNA ,Biodiversity ,Sequence Analysis, DNA ,respiratory system ,Ranking ,Sample Size ,Species evenness ,Rarefaction (ecology) ,Alpha diversity ,Species richness ,Water Microbiology ,Sequence Alignment ,human activities ,Environmental Monitoring ,Diversity (business) - Abstract
Summary The study of microbial diversity patterns is hampered by the enormous diversity of microbial communities and the lack of resources to sample them exhaustively. For many questions about richness and evenness, however, one only needs to know the relative order of diversity among samples rather than total diversity. We used 16S libraries from the Global Ocean Survey to investigate the ability of 10 diversity statistics (including rarefaction, non-parametric, parametric, curve extrapolation and diversity indices) to assess the relative diversity of six aquatic bacterial communities. Overall, we found that the statistics yielded remarkably similar rankings of the samples for a given sequence similarity cut-off. This correspondence, despite the different underlying assumptions of the statistics, suggests that diversity statistics are a useful tool for ranking samples of microbial diversity. In addition, sequence similarity cut-off influenced the diversity ranking of the samples, demonstrating that diversity statistics can also be used to detect differences in phylogenetic structure among microbial communities. Finally, a subsampling analysis suggests that further sequencing from these particular clone libraries would not have substantially changed the richness rankings of the samples.
- Published
- 2008
- Full Text
- View/download PDF
28. An MCMC algorithm for haplotype assembly from whole-genome sequence data
- Author
-
Aaron L. Halpern, Nelson Axelrod, Vikas Bansal, and Vineet Bafna
- Subjects
Hash function ,Population ,Genomics ,Computational biology ,Biology ,symbols.namesake ,Methods ,Genetics ,Humans ,Computer Simulation ,International HapMap Project ,education ,Genetics (clinical) ,education.field_of_study ,Markov chain ,Genome, Human ,Haplotype ,Markov chain Monte Carlo ,Markov Chains ,Haplotypes ,symbols ,Monte Carlo Method ,Algorithms ,Reference genome - Abstract
In comparison to genotypes, knowledge about haplotypes (the combination of alleles present on a single chromosome) is much more useful for whole-genome association studies and for making inferences about human evolutionary history. Haplotypes are typically inferred from population genotype data using computational methods. Whole-genome sequence data represent a promising resource for constructing haplotypes spanning hundreds of kilobases for an individual. In this article, we propose a Markov chain Monte Carlo (MCMC) algorithm, HASH (haplotype assembly for single human), for assembling haplotypes from sequenced DNA fragments that have been mapped to a reference genome assembly. The transitions of the Markov chain are generated using min-cut computations on graphs derived from the sequenced fragments. We have applied our method to infer haplotypes using whole-genome shotgun sequence data from a recently sequenced human individual. The high sequence coverage and presence of mate pairs result in fairly long haplotypes (N50 length ∼ 350 kb). Based on comparison of the sequenced fragments against the individual haplotypes, we demonstrate that the haplotypes for this individual inferred using HASH are significantly more accurate than the haplotypes estimated using a previously proposed greedy heuristic and a simple MCMC method. Using haplotypes from the HapMap project, we estimate the switch error rate of the haplotypes inferred using HASH to be quite low, ∼1.1%. Our Markov chain Monte Carlo algorithm represents a general framework for haplotype assembly that can be applied to sequence data generated by other sequencing technologies. The code implementing the methods and the phased individual haplotypes can be downloaded from http://www.cse.ucsd.edu/users/vibansal/HASH/.
- Published
- 2008
- Full Text
- View/download PDF
29. Evolutionary and Biomedical Insights from the Rhesus Macaque Genome
- Author
-
Carolin Kosiol, Belinda Giardine, Janet A. Hopkins, Andrew G. Clark, Ryan D. Hernandez, Peng Wang, Peter D. Stenson, Yu-Hui Rogers, Aaron L. Halpern, Andrew D. Kern, Webb Miller, Kymberlie H. Pepin, Melissa J. Hubisz, Kimberly D. Delehaunty, Robert E. Palermo, Matthew W. Hahn, Erica Sodergren, Brian P. Walenz, Scott M. Smith, Sandra L. Lee, Xiang Qin, Yucheng Feng, Ewen F. Kirkness, Vandita Joshi, Xiaoqiu Huang, Amanda F. Svatek, Fan Yang, Young Ho Kim, Laura Clarke, John E. Karro, Courtney Sherell White, Jessica Kolb, David Glenn Smith, Clay Davis, Jian Ma, Shobha Patil, Todd Wylie, Arian F.A. Smit, Shalini N. Jhangiani, Michael G. Katze, Edward V. Ball, Jennifer Godfrey, Heather A. Lawson, Brian J. Raney, Michael Holder, Ross C. Hardison, Christian J. Buhay, Zhangwan Li, Alicia Hawes, Eric J. Vallender, David A. Wheeler, James C. Wallace, Galt P. Barber, Jinchuan Xing, Yufeng Shen, Kayla E. Smith, Marvin Diep Dao, Jeffrey Rogers, Evan E. Eichler, Cynthia Pfannkoch, Jireh Santibanez, Kateryna D. Makova, Kashif Hirani, Robert M. Kuhn, Yanru Ren, David Neil Cooper, David Haussler, Carlos Bustamante, Adam Siepel, Mimi N. Chandrabose, Xiaoming Liu, George M. Weinstock, Teresa Utterback, Jarret Glasscock, Tomas Vinar, R. Alan Harris, Anis Karimpour-Fard, San Juana Ruiz, Lucinda Fulton, Asif T. Chinwalla, Aniko Sabo, Xinwei She, Charles Addo-Quaye, David L. Nelson, Lora Lewis, Hui Ke, Eli Venter, Donna M. Muzny, Alison Marklein, Bruce T. Lahn, Grace Pai, Brian W. Schneider, Shannon Dugan-Rocha, Henry Xing-Zhi Song, Jeremiah D. Degenhardt, Kyudong Han, Huaiyang Jiang, Stephanie M. Moore, Ian Schenck, Dinh Ngoc Ngo, Michael J. Cox, Heidie A. Paul, Ann S. Zwieg, Kim C. Worley, Craig Pohl, Rui Chen, Robert L. Strausberg, Ling-Ling Pu, Donna Karolchik, Jonathan R. Pollack, Geoffrey Okwuonu, Jennifer Hume, Elaine R. Mardis, David N. Messina, W. James Kent, William E. O'Brien, Fan Hsu, Andrew R. Jackson, Huyen Dinh, Hui Wang, LaDeana W. Hillier, Richard A. Gibbs, Alexandra Denby, Wesley C. Warren, Brygg Ullmer, Laura J. Dumas, Yih-shin Liu, Tony Attaway, Richard K. Wilson, Patrick Minx, James M. Sikela, Lan Zhang, Sandra Hines, Steven J. M. Jones, Amit Indap, Ze Cheng, Karin A. Remington, Stephanie Bell, Jungnam Lee, Kelly E. Bernard, Sang-Gook Han, Mariano Rocchi, Judith Hernandez, Betsy Ferguson, Hildegard Kehrer-Sawatzki, Ziad Khan, Aleksandar Milosavljevic, Joanne O. Nelson, Jeffery P. Demuth, Richard Burhans, David A. Parker, Lynne V. Nazareth, Roger E. Bumgarner, Marco A. Marra, Robert Baertsch, Andrew Cree, Paul Havlak, J. Craig Venter, Kay Prüfer, Rasmus Nielsen, Ewan Birney, Miriam K. Konkel, Mark A. Batzer, Arthur M. Lesk, Jacqueline E. Schein, Granger G. Sutton, Yan Ding, Yue Liu, Andy Peng Xiang, Miklós Csürös, Selina Vattathil, John W. Wallis, R. Gerald Fowler, Shiaw-Pyng Yang, Ramatu Ayiesha Gabisi, and Toni T. Garner
- Subjects
Male ,Biomedical Research ,Pan troglodytes ,Macaque ,Human accelerated regions ,Genome ,Evolution, Molecular ,Species Specificity ,Gene Duplication ,biology.animal ,Animals ,Humans ,Primate ,Gene Rearrangement ,Genetics ,Whole genome sequencing ,Multidisciplinary ,biology ,Genetic Diseases, Inborn ,Genetic Variation ,Sequence Analysis, DNA ,Gene rearrangement ,biology.organism_classification ,Macaca mulatta ,Rhesus macaque ,Homo sapiens ,Evolutionary biology ,Multigene Family ,Mutation ,Female - Abstract
The rhesus macaque ( Macaca mulatta ) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.
- Published
- 2007
- Full Text
- View/download PDF
30. Shotgun sequence assembly and recent segmental duplications within the human genome
- Author
-
Ze Cheng, Ge Liu, Eray Tüzün, Evan E. Eichler, Deanna M. Church, Zhaoshi Jiang, Xinwei She, Granger G. Sutton, Royden A. Clark, and Aaron L. Halpern
- Subjects
Genetics ,Multidisciplinary ,Genome, Human ,Sequence analysis ,Computational Biology ,Sequence assembly ,Sequence alignment ,Genomics ,Sequence Analysis, DNA ,Computational biology ,Biology ,Physical Chromosome Mapping ,Sensitivity and Specificity ,Genome ,Mice ,Genes, Duplicate ,Gene Duplication ,Animals ,Chromosomes, Human ,Humans ,Human genome ,Shotgun Sequence Assembly ,Repeated sequence ,Sequence Alignment ,Segmental duplication - Abstract
Complex eukaryotic genomes are now being sequenced at an accelerated pace primarily using whole-genome shotgun (WGS) sequence assembly approaches. WGS assembly was initially criticized because of its perceived inability to resolve repeat structures within genomes. Here, we quantify the effect of WGS sequence assembly on large, highly similar repeats by comparison of the segmental duplication content of two different human genome assemblies. Our analysis shows that large (> 15 kilobases) and highly identical (> 97%) duplications are not adequately resolved by WGS assembly. This leads to significant reduction in genome length and the loss of genes embedded within duplications. Comparable analyses of mouse genome assemblies confirm that strict WGS sequence assembly will oversimplify our understanding of mammalian genome structure and evolution; a hybrid strategy using a targeted clone-by-clone approach to resolve duplications is proposed.
- Published
- 2004
- Full Text
- View/download PDF
31. Environmental Genome Shotgun Sequencing of the Sargasso Sea
- Author
-
Owen White, Doug Rusch, Derrick E. Fouts, Kenneth H. Nealson, Jonathan A. Eisen, Anthony H. Knap, John F. Heidelberg, Hamilton O. Smith, J. Craig Venter, Karin A. Remington, Holly Baden-Tillson, Samuel Levy, Jeremy Peterson, Michael W. Lomas, Cynthia Pfannkoch, William C. Nelson, Karen E. Nelson, Dongying Wu, Jeff Hoffman, Ian T. Paulsen, Rachel Parsons, Aaron L. Halpern, and Yu-Hui Rogers
- Subjects
Rhodopsin ,Molecular Sequence Data ,Nitrosopumilus ,Biodiversity ,Genomics ,Biology ,Cyanobacteria ,Genome ,Genes, Archaeal ,Genome, Archaeal ,Phylogenetics ,Rhodopsins, Microbial ,Bacteriophages ,Seawater ,Photosynthesis ,Atlantic Ocean ,Relative species abundance ,Ecosystem ,Phylogeny ,Multidisciplinary ,Bacteria ,Shotgun sequencing ,Ecology ,Computational Biology ,Genes, rRNA ,Sequence Analysis, DNA ,biology.organism_classification ,Archaea ,Eukaryotic Cells ,Genes, Bacterial ,Water Microbiology ,Genome, Bacterial ,Plasmids - Abstract
We have applied “whole-genome shotgun sequencing” to microbial populations collected en masse on tangential flow and impact filters from seawater samples collected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairs of nonredundant sequence was generated, annotated, and analyzed to elucidate the gene content, diversity, and relative abundance of the organisms within these environmental samples. These data are estimated to derive from at least 1800 genomic species based on sequence relatedness, including 148 previously unknown bacterial phylotypes. We have identified over 1.2 million previously unknown genes represented in these samples, including more than 782 new rhodopsin-like photoreceptors. Variation in species present and stoichiometry suggests substantial oceanic microbial diversity. Microorganisms are responsible for most of the biogeochemical cycles that shape the environment of Earth and its oceans. Yet, these organisms are the least well understood on Earth, as the ability to study and understand the metabolic potential of microorganisms has been hampered by the inability to generate pure cultures. Recent studies have begun to explore environ
- Published
- 2004
- Full Text
- View/download PDF
32. The Dog Genome: Survey Sequencing and Comparative Analysis
- Author
-
Karin A. Remington, Arthur L. Delcher, Wei Wang, Samuel Levy, Mihai Pop, Aaron L. Halpern, Vineet Bafna, Ewen F. Kirkness, J. Craig Venter, Douglas B. Rusch, and Claire M. Fraser
- Subjects
Male ,Mutation rate ,Molecular Sequence Data ,Computational biology ,Biology ,Polymorphism, Single Nucleotide ,Synteny ,Genome ,Contig Mapping ,Mice ,Dogs ,Phylogenetics ,Animals ,Humans ,RNA, Messenger ,Conserved Sequence ,Phylogeny ,Repetitive Sequences, Nucleic Acid ,Short Interspersed Nucleotide Elements ,Genomic organization ,Sequence (medicine) ,Genetics ,Whole genome sequencing ,Multidisciplinary ,Genome, Human ,Nucleic acid sequence ,Computational Biology ,Genetic Variation ,Genomics ,Sequence Analysis, DNA ,Physical Chromosome Mapping ,Chromosomes, Mammalian ,Long Interspersed Nucleotide Elements ,Mutation ,DNA, Intergenic ,Human genome ,Sequence Alignment - Abstract
A survey of the dog genome sequence (6.22 million sequence reads; 1.5× coverage) demonstrates the power of sample sequencing for comparative analysis of mammalian genomes and the generation of species-specific resources. More than 650 million base pairs (>25%) of dog sequence align uniquely to the human genome, including fragments of putative orthologs for 18,473 of 24,567 annotated human genes. Mutation rates, conserved synteny, repeat content, and phylogeny can be compared among human, mouse, and dog. A variety of polymorphic elements are identified that will be valuable for mapping the genetic basis of diseases and traits in the dog.
- Published
- 2003
- Full Text
- View/download PDF
33. A preliminary comparison of the mouse and human genomes
- Author
-
Cheryl A. Evans, Mark Yandell, Jian Wang, Hamilton O. Smith, George L. Gabor Miklos, Joseph H. Nadeau, Kendra Biddick, Granger G. Sutton, Arthur L. Delcher, Xiangqun H. Zheng, Ron Wides, Steven L. Salzberg, Jeffrey Hoover, Vivien Bonazzi, William H. Majoros, Mark Raymond Adams, Robert A. Holt, Fu Lu, Peter W. Li, Richard J. Mural, Eugene W. Myers, Aaron L. Halpern, Doug Rusch, and J. Craig Venter
- Subjects
Comparative genomics ,Genetics ,Genome evolution ,Cot analysis ,Sequence assembly ,Human genome ,General Medicine ,Computational biology ,Biology ,Genome ,Synteny ,Reference genome - Abstract
Accurate annotated assemblies of the mouse and human genomes enable a detailed comparison of the organization and evolution of the two genomes. We have completed several assemblies of both the mouse, with and without public data, and human genomes. Analysis of these assemblies suggests the mouse genome is about 10% smaller than the human genome primarily because of a difference in the content of repetitive DNA between the two genomes. More than 300,000 positions in these two genomes can be aligned with one another based on short segments of sequence similarity. These conserved segments significantly enhance the resolution of the resultant comparative maps and can be used to divide the genomes into regions of conserved-shared synteny. The genes found in such regions are highly conserved as is their relative order and orientation. Comparison of the human and mouse genome is expected to be key to deciphering the important biological information encoded in the mammalian genome. A prerequisite to comparing complex genomes such as those of mouse and human is the availability of annotated assemblies of both genomes that are comparable in quality and completeness. Since February 2001, we have assembled, annotated and delivered to our subscribers two versions of the human genome and two versions of the mouse genome. A third assembly of the human genome is being completed and will be delivered by fall of 2002. These annotated assemblies provide the starting materials for the genome-wide comparisons of the mouse and human reported here. We will begin with a description of the first Celera whole genome assembly of the mouse to provide a general basis of the quality and completeness of these data and then will report the results of a preliminary comparison between these two genomes.
- Published
- 2002
- Full Text
- View/download PDF
34. The Genome Sequence of the Malaria Mosquito Anopheles gambiae
- Author
-
Mei Wang, Frank H. Collins, Yong Liang, José M. C. Ribeiro, Zhijian Tu, Jason R. Miller, Mark Yandell, Pantelis Topalis, Hongguang Shao, Qi Zhao, Hamilton O. Smith, Ali N Dana, Zhaoxi Ke, J. Craig Venter, Deborah R. Nusskern, Christos Louis, Ivica Letunic, Brian P. Walenz, Granger G. Sutton, Patrick Wincker, Anastasios C. Koutsos, Paul T. Brey, Ewan Birney, Jean Weissenbach, Fotis C. Kafatos, Cheryl A. Evans, Kerry J. Woodford, Dana Thomasova, Eugene W. Myers, Stephen L. Hoffman, Kokoza Eb, Josep F. Abril, Randall Bolanos, Megan A. Regier, Holly Baden, George K. Christophides, Véronique de Berardinis, Jingtao Sun, James R. Hogan, Kabir Chatuverdi, Ron Wides, Emmanuel Mongin, Igor F. Zhimulev, Steven L. Salzberg, Danita Baldwin, Richard J. Mural, Shiaoping C. Zhu, Anibal Cravchik, Jhy-Jhu Lin, G. Mani Subramanian, Young S. Hong, Shuang Cai, Francis Kalush, Rosane Charlab, Martin Wu, Claudia Blass, Mark Raymond Adams, Robert A. Holt, Clark M. Mobarry, Douglas B. Rusch, Michael Flanigan, Jim Biedler, Susanne L. Hladun, Ping Guan, Cynthia Sitter, Joel A. Malek, Mario Coluzzi, Cynthia Pfannkoch, Arthur L. Delcher, Alessandra della Torre, Maria F. Unger, Evgeny M. Zdobnov, Stephan Meister, Karin A. Remington, Peter W. Atkinson, Malcolm J. Gardner, Vladimir Benes, Ian M. Dew, Maria V. Sharakhova, X. Wang, Hongyu Zhang, Jian Wang, Jeffrey Hoover, Cheryl L. Kraft, Charles Roth, Andrew G. Clark, Shaying Zhao, Jyoti Shetty, Tina C. McIntosh, Aihui Wang, Zhiping Gu, Aaron L. Halpern, Anne Grundschober-Freimoser, David A. O'Brochta, Peter Arensburger, Brendan J. Loftus, Lucas Q. Ton, Véronique Anthouard, Mary Barnstead, John Lopez, Peer Bork, Didier Boscus, Michele Clamp, Jennifer R. Wortman, Claire M. Fraser, Lisa Friedli, William H. Majoros, Thomas J. Smith, Olivier Jaillon, Val Curwen, Samuel Broder, Sean D. Murphy, Roderic Guigó, Neil F. Lobo, Mathew A. Chrystal, Alison Yao, Alex Levitsky, Renee Strong, Maureen E. Hillenmeyer, Zhongwu Lai, Chinnappa D. Kodira, Rong Qi, and Zdobnov, Evgeny
- Subjects
Chromosomes, Artificial, Bacterial ,Drosophila melanogaster/genetics ,Mosquito Control ,Proteome ,Enzymes/chemistry/genetics/metabolism ,Anopheles gambiae ,Genes, Insect ,Genome ,Plasmodium falciparum/growth & development ,Malaria, Falciparum ,Expressed Sequence Tags ,Genetics ,Expressed sequence tag ,Multidisciplinary ,Physical Chromosome Mapping ,Biological Evolution ,Enzymes ,Blood ,Drosophila melanogaster ,Insect Proteins ,Digestion ,Sequence analysis ,Molecular Sequence Data ,Plasmodium falciparum ,Biology ,Polymorphism, Single Nucleotide ,Species Specificity ,Anopheles ,Genetic variation ,Transcription Factors/chemistry/genetics/physiology ,Animals ,Humans ,Insect Proteins/chemistry/genetics/physiology ,Malaria, Falciparum/transmission ,Gene ,Anopheles/classification/genetics/parasitology/physiology ,Whole genome sequencing ,Haplotype ,Computational Biology ,Genetic Variation ,Feeding Behavior ,Sequence Analysis, DNA ,biology.organism_classification ,Insect Vectors ,Gene Expression Regulation ,Haplotypes ,Chromosome Inversion ,DNA Transposable Elements ,Insect Vectors/genetics/parasitology/physiology ,Transcription Factors - Abstract
Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency (“dual haplotypes”) in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.
- Published
- 2002
- Full Text
- View/download PDF
35. Efficiently detecting polymorphisms during the fragment assembly process
- Author
-
Daniel Fasulo, Clark M. Mobarry, Ian M. Dew, and Aaron L. Halpern
- Subjects
Statistics and Probability ,Molecular Sequence Data ,Population ,Sequence assembly ,DNA Fragmentation ,Computational biology ,Biology ,Biochemistry ,Genome ,Set (abstract data type) ,Consensus Sequence ,Indel ,education ,Molecular Biology ,Sequence (medicine) ,Genetics ,education.field_of_study ,Polymorphism, Genetic ,Base Sequence ,Shotgun sequencing ,Gene Expression Profiling ,Genetic Variation ,Sequence Analysis, DNA ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Graph (abstract data type) ,Sequence Alignment ,Algorithms ,Polymorphism, Restriction Fragment Length - Abstract
Motivation: Current genomic sequence assemblers assume that the input data is derived from a single, homogeneous source. However, recent whole-genome shotgun sequencing projects have violated this assumption, resulting in input fragments covering the same region of the genome whose sequences differ due to polymorphic variation in the population. While single-nucleotide polymorphisms (SNPs) do not pose a significant problem to state-of-the-art assembly methods, these methods do not handle insertion/deletion (indel) polymorphisms of more than a few bases. Results: This paper describes an efficient method for detecting sequence discrepencies due to polymorphism that avoids resorting to global use of more costly, less stringent affine sequence alignments. Instead, the algorithm uses graph-based methods to determine the small set of fragments involved in each polymorphism and performs more sophisticated alignments only among fragments in that set. Results from the incorporation of this method into the Celera Assembler are reported for the D. melanogaster, H. sapiens, and M. musculus genomes. Availability: The method described herein does not constitute a stand-alone software application, but is laid out in sufficient detail to be implemented as a component of any genomic sequence assembler. Contact: daniel.fasulo@celera.com Keywords: whole-genome assembly; shotgun sequencing; polymorphism.
- Published
- 2002
- Full Text
- View/download PDF
36. Comparison of papillomavirus and immunodeficiency virus evolutionary patterns in the context of a papillomavirus vaccine
- Author
-
Aaron L Halpern
- Subjects
viruses ,Human immunodeficiency virus (HIV) ,Context (language use) ,Biology ,medicine.disease_cause ,Immunodeficiency virus ,Virology ,medicine ,Animals ,Humans ,Papillomavirus Vaccines ,Papillomaviridae ,Phylogeny ,Phylogenetic tree ,Papillomavirus Infections ,Zoonosis ,Genetic Variation ,HIV ,virus diseases ,Hpv vaccination ,Viral Vaccines ,Simian immunodeficiency virus ,medicine.disease ,Biological Evolution ,female genital diseases and pregnancy complications ,Vaccination ,Tumor Virus Infections ,Infectious Diseases - Abstract
In contemplating a vaccine for human papillomaviruses (HPVs), it is important to consider the evolutionary context in which such a vaccine would be deployed. The human immunodeficiency virus, having been the subject of even more extensive study than HPV, shares certain salient features with regards to phylogenetic structure, and may serve as a model for contemplation of possible difficulties with HPV vaccination. However, there are also striking differences in the evolutionary potentials and histories of the viruses that permit an optimistic outlook for HPV. These similarities and differences, as well as their implications for vaccination studies, are reviewed.
- Published
- 2000
- Full Text
- View/download PDF
37. Computational techniques for human genome resequencing using mated gapped reads
- Author
-
Jessica Ebert, Vitali Karpinchyk, Igor Nazarenko, Dennis G. Ballinger, Jonathan M. Baccash, Anushka Brownley, Matt Morenzoni, Krishna Pant, Geoffrey B. Nilsen, Radoje Drmanac, Aaron L. Halpern, Bruce K. Martin, and Paolo Carnevali
- Subjects
Sequence assembly ,Genomics ,Biology ,Contig Mapping ,Genetics ,Humans ,Computer Simulation ,Molecular Biology ,Alleles ,Whole genome sequencing ,Sequence ,Base Sequence ,Models, Genetic ,Genome, Human ,Chromosome Mapping ,Statistical model ,Bayes Theorem ,Sequence Analysis, DNA ,Base (topology) ,Bayesian statistics ,Computational Mathematics ,Computational Theory and Mathematics ,Modeling and Simulation ,Data Interpretation, Statistical ,Human genome ,Algorithm ,Algorithms - Abstract
Unchained base reads on self-assembling DNA nanoarrays have recently emerged as a promising approach to low-cost, high-quality resequencing of human genomes. Because of unique characteristics of these mated pair reads, existing computational methods for resequencing assembly, such as those based on map-consensus calling, are not adequate for accurate variant calling. We describe novel computational methods developed for accurate calling of SNPs and short substitutions and indels (100 bp); the same methods apply to evaluation of hypothesized larger, structural variations. We use an optimization process that iteratively adjusts the genome sequence to maximize its a posteriori probability given the observed reads. For each candidate sequence, this probability is computed using Bayesian statistics with a simple read generation model and simplifying assumptions that make the problem computationally tractable. The optimization process iteratively applies one-base substitutions, insertions, and deletions until convergence is achieved to an optimum diploid sequence. A local de novo assembly procedure that generalizes approaches based on De Bruijn graphs is used to seed the optimization process in order to reduce the chance of converging to local optima. Finally, a correlation-based filter is applied to reduce the false positive rate caused by the presence of repetitive regions in the reference genome.
- Published
- 2011
38. TreeLign
- Author
-
Shaojie Zhang, Aaron L. Halpern, and Yuan Li
- Subjects
Phylogenetic tree ,business.industry ,Maximum likelihood ,Pattern recognition ,Biological classification ,Biology ,computer.software_genre ,Reference tree ,Maximum parsimony ,Metagenomics ,Tree rearrangement ,Computational phylogenetics ,Data mining ,Artificial intelligence ,business ,computer - Abstract
Phylogenetic assignment of 16s rRNA has been frequently used for taxonomic classification. Recently, high-throughput sequencing, especially in the context of environmental or metagenomic sequencing projects, has made fast and accurate taxonomic classification an important goal. Existing classification methods are either fast, but too coarse-grained and inaccurate or fine-grained and accurate but too slow for use in practice. In this paper, we propose a new computational method, TreeLign, to rapidly and accurately conduct alignment and phylogenetic assignments for novel sequences, given a reference phylogenetic tree and an alignment. TreeLign first constructs profiles of every branch on the reference tree, then, for each query sequence, tries assigning it to every possible branch, and finally obtains a new tree and a new alignment which are jointly optimal in terms of Maximum Parsimony (MP). We tested the accuracy and robustness of TreeLign on both a large and a small 16S rRNA dataset extracted from the core set of GreenGenes. The results on the large dataset show that the assignments of TreeLign are in general consistent with the phylogenetic tree of the core set of GreenGenes. And, the results on the small dataset show that TreeLign achieves comparable accuracy compared with existing maximum likelihood based methods, but requires much less computational time.
- Published
- 2011
- Full Text
- View/download PDF
39. Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees
- Author
-
J. Craig Venter, Martin Wu, Marvin Frazier, Douglas B. Rusch, Shibu Yooseph, Jonathan A. Eisen, Aaron L. Halpern, Dongying Wu, and Fleischer, Robert
- Subjects
Genome ,Computer Applications ,Databases, Genetic ,Genome Evolution ,Phylogeny ,Genetics ,Plant Growth and Development ,0303 health sciences ,Multidisciplinary ,Phylogenetic tree ,Ecology ,Archaeal Evolution ,Genomics ,Phylogenetics ,Multigene Family ,Medicine ,Algorithms ,Biotechnology ,Research Article ,Archaeans ,Sequence analysis ,Evolution ,General Science & Technology ,Oceans and Seas ,Science ,Sequence alignment ,Biology ,Microbiology ,DNA sequencing ,Viral Evolution ,Evolution, Molecular ,03 medical and health sciences ,Databases ,Genetic ,Bacterial Proteins ,Virology ,Evolutionary Systematics ,14. Life underwater ,030304 developmental biology ,Ribosomal ,Evolutionary Biology ,Bacterial Evolution ,Base Sequence ,030306 microbiology ,Molecular ,Computational Biology ,Genomic Evolution ,Bacteriology ,Comparative Genomics ,rpoB ,Organismal Evolution ,Rec A Recombinases ,Evolutionary biology ,Metagenomics ,RNA, Ribosomal ,Evolutionary Ecology ,Microbial Evolution ,Computer Science ,RNA ,Environmental Protection ,Developmental Biology - Abstract
BackgroundMost of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species.Methodology/principal findingsWe designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences.Conclusions/significanceOf the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them.
- Published
- 2011
40. Functional genomic signatures of sponge bacteria reveal unique and shared features of symbiosis
- Author
-
Suhelen Egan, Pui Yi Yung, Doug Rusch, Karla B. Heidelberg, Staffan Kjelleberg, Torsten Thomas, Matthew Z. DeMaere, Matthew R. Lewis, Peter D. Steinberg, and Aaron L. Halpern
- Subjects
DNA, Bacterial ,Microbial metabolism ,Biology ,Microbiology ,Symbiosis ,RNA, Ribosomal, 16S ,Animals ,Seawater ,Ecology, Evolution, Behavior and Systematics ,Ecosystem ,Phylogeny ,Comparative Genomic Hybridization ,Bacteria ,Sequence Analysis, DNA ,biology.organism_classification ,Biological Evolution ,Porifera ,Tetratricopeptide ,Sponge ,Phylogenetic diversity ,Evolutionary biology ,Metagenomics ,DNA Transposable Elements ,Metagenome ,Genome, Bacterial ,Symbiotic bacteria - Abstract
Sponges form close relationships with bacteria, and a remarkable phylogenetic diversity of yet-uncultured bacteria has been identified from sponges using molecular methods. In this study, we use a comparative metagenomic analysis of the bacterial community in the model sponge Cymbastela concentrica and in the surrounding seawater to identify previously unrecognized genomic signatures and functions for sponge bacteria. We observed a surprisingly large number of transposable insertion elements, a feature also observed in other symbiotic bacteria, as well as a set of predicted mechanisms that may defend the sponge community against the introduction of foreign DNA and hence contribute to its genetic resilience. Moreover, several shared metabolic interactions between bacteria and host include vitamin production, nutrient transport and utilization, and redox sensing and response. Finally, an abundance of protein–protein interactions mediated through ankyrin and tetratricopeptide repeat proteins could represent a mechanism for the sponge to discriminate between food and resident bacteria. These data provide new insight into the evolution of symbiotic diversity, microbial metabolism and host–microbe interactions in sponges.
- Published
- 2010
41. Consensus generation and variant detection by Celera Assembler
- Author
-
Nelson Axelrod, Jason R. Miller, Gennady Denisov, Samuel Levy, Granger G. Sutton, Brian P. Walenz, and Aaron L. Halpern
- Subjects
Statistics and Probability ,Sequence analysis ,DNA Mutational Analysis ,Molecular Sequence Data ,Locus (genetics) ,Single-nucleotide polymorphism ,Computational biology ,Biology ,Haploidy ,Biochemistry ,Genome ,Gene Frequency ,Consensus Sequence ,Consensus sequence ,Humans ,Molecular Biology ,Genetics ,Base Sequence ,Shotgun sequencing ,Genome, Human ,Chromosome Mapping ,Genetic Variation ,Sequence Analysis, DNA ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Human genome ,Ploidy ,Algorithms ,Software - Abstract
Motivation: We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms.Results: Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2 033 311 detected regions of sequence variation. In 33 269 out of 460 373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%.Availability: The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/Contact: gdenisov@jcvi.org
- Published
- 2008
42. The Sorcerer II Global Ocean Sampling Expedition: metagenomic characterization of viruses within aquatic microbial samples
- Author
-
J. Craig Venter, Shibu Yooseph, John I. Glass, Shannon J. Williamson, Aaron L. Halpern, Christopher S. Miller, Marvin Frazier, Douglas Fadrosh, Cynthia Andrews-Pfannkoch, Douglas B. Rusch, Granger G. Sutton, and Karla B. Heidelberg
- Subjects
Genome evolution ,Sequence analysis ,Genetic Linkage ,Oceans and Seas ,viruses ,Science ,Genome, Viral ,Genome ,Ecology/Marine and Freshwater Ecology ,Phylogenetics ,Virology ,Microbiology/Environmental Microbiology ,Molecular Biology ,Phylogeny ,Genetics ,Multidisciplinary ,Phylogenetic tree ,biology ,biology.organism_classification ,Computational Biology/Metagenomics ,Metagenomics ,Horizontal gene transfer ,Medicine ,Prochlorococcus ,Water Microbiology ,Research Article - Abstract
Viruses are the most abundant biological entities on our planet. Interactions between viruses and their hosts impact several important biological processes in the world's oceans such as horizontal gene transfer, microbial diversity and biogeochemical cycling. Interrogation of microbial metagenomic sequence data collected as part of the Sorcerer II Global Ocean Expedition (GOS) revealed a high abundance of viral sequences, representing approximately 3% of the total predicted proteins. Cluster analyses of the viral sequences revealed hundreds to thousands of viral genes encoding various metabolic and cellular functions. Quantitative analyses of viral genes of host origin performed on the viral fraction of aquatic samples confirmed the viral nature of these sequences and suggested that significant portions of aquatic viral communities behave as reservoirs of such genetic material. Distributional and phylogenetic analyses of these host-derived viral sequences also suggested that viral acquisition of environmentally relevant genes of host origin is a more abundant and widespread phenomenon than previously appreciated. The predominant viral sequences identified within microbial fractions originated from tailed bacteriophages and exhibited varying global distributions according to viral family. Recruitment of GOS viral sequence fragments against 27 complete aquatic viral genomes revealed that only one reference bacteriophage genome was highly abundant and was closely related, but not identical, to the cyanomyovirus P-SSM4. The co-distribution across all sampling sites of P-SSM4-like sequences with the dominant ecotype of its host, Prochlorococcus supports the classification of the viral sequences as P-SSM4-like and suggests that this virus may influence the abundance, distribution and diversity of one of the most dominant components of picophytoplankton in oligotrophic oceans. In summary, the abundance and broad geographical distribution of viral sequences within microbial fractions, the prevalence of genes among viral sequences that encode microbial physiological function and their distinct phylogenetic distribution lend strong support to the notion that viral-mediated gene acquisition is a common and ongoing mechanism for generating microbial diversity in the marine environment.
- Published
- 2008
43. Evolution of genes and genomes on the Drosophila phylogeny
- Author
-
Adam M. Phillippy, Edward Grandbois, Pen MacDonald, Iain MacCallum, Laura K. Reed, Wojciech Makalowski, Tracey Honan, Tania Tassinari Rieger, Melissa J. Hubisz, Josep M. Comeron, Douglas Smith, Jennifer Godfrey, Sebastian Strempel, Amr Abdouelleil, Brenton Gravely, Harindra Arachi, Albert J. Vilella, Marc Azer, Sarah A. Teichmann, Roger A. Hoskins, Corbin D. Jones, Keenan Ross, Derek Wilson, Stuart J. Newfeld, John Stalker, Thomas D. Watts, Dennis C. Friedrich, Therese A. Markow, Michael U. Mollenhauer, Tina Goode, Geneva Young, Terry Shea, Krista Lance, Karin A. Remington, Kevin A. Edwards, Lynne Aftuck, Cecil Rise, Sheridon Channer, Matthew D. Rasmussen, Nicole Stange-Thomann, Annie Lui, Robert A. Reenan, Todd Sparrow, Dave Begun, Tamrat Negash, Laura K. Sirot, Adrianne Brand, Adam Brown, Daisuke Yamamoto, Pema Phunkhang, Justin Abreu, Russell Schwartz, Ana Llopart, Abderrahim Farina, Kebede Maru, Chung-I Wu, Allen Alexander, Scott Anderson, So Jeong Lee, Jason Blye, Gary H. Karpen, Wilfried Haerty, Daniel A. Barbash, Peter Rogov, Barry O'Neill, Rachel Mittelman, Jakob Skou Pedersen, Leanne Hughes, Robert K. Bradley, Graziano Pesole, Wyatt W. Anderson, Anthony J. Greenberg, Alejandro Sánchez-Gracia, Julio Rozas, Stephen W. Schaeffer, Yama Thoulutsang, Roger K. Butlin, David H. Ardell, Stuart DeGray, Chris P. Ponting, Deborah E. Stage, Corrado Caggese, Montserrat Aguadé, Casey M. Bergman, Diallo Ferguson, Peili Zhang, Jeffrey R. Powell, Hajime Sato, Xiaohong Liu, Marta Sabariego Puig, Michael Parisi, Passang Dorje, Yoshihiko Tomimura, Adal Abebe, Carlo G. Artieri, Brian Hurhula, Filip Rege, Peter D. Keightley, Andrew Barry, Pablo Alvarez, Tsamla Tsamla, Marvin Wasserman, Santosh Jagadeeshan, Daniel L. Halligan, Chelsea D. Foley, Kim D. Delehaunty, Manfred Grabherr, Sourav Chatterji, Angela N. Brooks, James C. Costello, Mieke Citroen, James A. Yorke, Hsiao Pei Yang, Charles Chapple, Jian Lu, Carlos A. Machado, Norbu Dhargay, Tsering Wangchuk, Anat Caspi, Patrick Cahill, Tashi Bayul, Lisa Levesque, Otero L. Oyono, Atanas Mihalev, Dawa Thoulutsang, Dawn N. Abt, Sujaa Raghuraman, Manyuan Long, Maria Mendez-Lago, Charles Matthews, Kimberly Dooley, Alex Wong, Melanie A. Huntley, William R. Jeck, Ira Topping, Ben Kanga, José P. Abad, Ana Cristina Lauer Garcia, Brikti Abera, Kunsang Gyaltsen, Jonathan Butler, Alicia Franke, Michael C. Schatz, Cheewhye Chin, Charles F. Aquadro, Justin Johnson, Bryant F. McAllister, Georgia Giannoukos, M. Erii Husby, Rod A. Wing, Shangtao Liu, Jean L. Chang, Jennifer Daub, Eiko Kataoka, Leopold Parts, Rakela Lubonja, Margaret Priest, Yoshiko N. Tobari, Teena Mehta, Evgeny M. Zdobnov, Yeshi Lokyitsang, Richard Elong, Matthew J. Parisi, Louis Meneus, Eric S. Lander, Alan Filipski, Gary Gearin, Nabil Hafez, Nicholas Sisneros, David B. Jaffe, Ian Holmes, Marina Sirota, Leonid Boguslavskiy, Lisa Chuda, LaDeana W. Hillier, Meizhong Luo, Phil Batterham, Michael Kleber, Richard K. Wilson, Yama Cheshatsang, Qing Yu, Rebecca Reyes, Matthew W. Hahn, Andreas Heger, Mar Marzo, Patrick Minx, Kerstin Lindblad-Toh, Vera L. S. Valente, Adam Wilson, William C. Jordan, Mohamed A. F. Noor, Chiao-Feng Lin, Asha Kamat, Heather Ebling, Mihai Pop, Frances Letendre, Mariana F. Wolfner, Don Gilbert, Ngawang Sherpa, Riza M. Daza, Oana Mihai, Gabriel C. Wu, Aaron M. Berlin, Ewen F. Kirkness, Monika D. Huard, Robert S. Fulton, Randall H. Brown, Danni Zhong, Sharon Stavropoulos, Venky N. Iyer, Xu Mu, Christina R. Gearin, David M. Rand, Jerry A. Coyne, Dan Hultmark, Jill Falk, Christopher Patti, Montserrat Papaceit, James Meldrim, Valentine Mlenga, Muneo Matsuda, Sven Findeiß, Todd A. Schlenke, Kevin McKernan, Brian P. Walenz, Timothy B. Sackton, Leonardo Koerich, Peter An, Robert Nicol, Chuong B. Do, Dmitry Khazanovich, Carmen Segarra, Maura Costello, St Christophe Acer, Claudia Rohde, Serafim Batzoglou, Hadi Quesneville, Evan Mauceli, Andy Vo, Luciano M. Matzkin, Susan E. Celniker, Patrick M. O’Grady, William M. Gelbart, Lloyd Low, Jamal Abdulkadir, Jessica Spaulding, Brian R. Calvi, Charlotte Henson, Robert David, Jennifer L. Hall, Andrew G. Clark, Anastasia Gardiner, Susan M. Russo, Birhane Hagos, Kerri Topham, Amy Denise Reily, Eli Venter, Jerome Naylor, Sandra W. Clifton, Valer Gotea, Samuel R. Gross, Manolis Kellis, Claude Bonnet, Christopher Strader, Tashi Lokyitsang, Nyima Norbu, Jennifer Baldwin, Stephen M. Mount, Robert L. Strausberg, Shailendra Yadav, Kristipati Ravi Ram, Steven L. Salzberg, Erik Gustafson, David A. Garfield, Eva Freyhult, Arthur L. Delcher, Enrico Blanco, Granger G. Sutton, Jason M. Tsolas, Charles Robin, Angie S. Hinrichs, Christopher D. Smith, Jane Wilkinson, Brendan McKernan, Fritz Pierre, William McCusker, Brian Oliver, Barry E. Garvin, Sudhir Kumar, Peter Kisner, Kunsang Dorjee, A. Bernardo Carvalho, Anna Montmayeur, Andrew Zimmer, Diana Shih, Wei Tao, Shiaw Pyng Yang, Sante Gnerre, Sampath Settipalli, Thu Nguyen, Paolo Barsanti, Brian P. Lazzaro, Sonja J. Prohaska, J. Craig Venter, Senait Tesfaye, Susan McDonough, Kim D. Pruitt, Alexander Stark, Sergio Castrezana, Lucinda Fulton, Richard T. Lapoint, Greg Gibson, John Spieth, Boris Adryan, Georgius De Haan, Sheila Fisher, Daniel A. Pollard, Seva Kashin, Rob J. Kulathinal, Michael B. Eisen, Nathaniel Novod, Christina Demaso, Alan Dupes, Amanda M. Larracuente, Toby Bloom, Alfredo Villasante, Charles H. Langley, Rama S. Singh, Niall J. Lennon, Kristi L. Montooth, Daniel Barker, Wolfgang Stephan, David Sturgill, Ruiqiang Li, Andrew Hollinger, Boris Boukhgalter, Talene Thomson, Patrick Cooke, Zac Zwirko, Nadia D. Singh, Michael Weiand, Lior Pachter, Roderic Guigó, Yu Zhang, Jay D. Evans, Stephanie Bosak, Rosie Levine, Lu Shi, Kiyohito Yoshida, Carolyn S. McBride, Pouya Kheradpour, William Brockman, Alberto Civetta, Hiroshi Akashi, Marcia Lara, Susan Faro, Sam Griffiths-Jones, Michael R. Brent, Thomas H. Eickbush, Gane Ka-Shu Wong, Elizabeth P. Ryan, Erica Anderson, Roberta Kwok, Asif T. Chinwalla, Sahal Osman, Nga Nguyen, Damiano Porcelli, Missole Doricent, Saverio Vicario, Marc Rubenfield, Bárbara Negre, Gillian M. Halter, Erin E. Dooley, Elena R. Lozovsky, William Lee, Alville Collymore, Catherine Stone, Tanya Mihova, Jun Wang, Karsten Kristiansen, Imane Bourzgui, Michael F. Lin, Katie D'Aco, Filipe G. Vieira, Choe Norbu, Yu-Hui Rogers, Aaron L. Halpern, Eugene W. Myers, Sharleen Grewal, Robert T. Good, Alfredo Ruiz, Dave Kudrna, Joseph Graham, Alex Lipovsky, Leonidas Mulrain, Tsering Wangdi, Roman Arguello, Mira V. Han, Arjun Bhutkar, Rasmus Nielsen, David J. Saranga, Aleksey V. Zimin, Vasilia Magnisalis, Helen Vassiliev, Thomas C. Kaufman, Eva Markiewicz, Temple F. Smith, Jinlei Liu, Loryn Gadbois, Michael G. Ritchie, Lisa Zembek, Daniel Bessette, Pasang Bachantsang, Adam Navidi, Department of Molecular Biology and Genetics, Cornell University [New York], Lawrence Berkeley National Laboratory [Berkeley] (LBNL), University of California [Berkeley], University of California, Agencourt Bioscience Corporation, Partenaires INRAE, Faculty of Life Science, University of Manchester [Manchester], Laboratory of Cellular and Developmental Biology (LCDB), NIDDK, NIH, Department of Ecology and Evolutionary Biology, University of Arizona, Department of Biology, Indiana University [Bloomington], Indiana University System-Indiana University System, Massachusetts Institute of Technology (MIT), Harvard University [Cambridge], Centro de Biología Molecular Severo Ochoa [Madrid] (CBMSO), Universidad Autonoma de Madrid (UAM)-Consejo Superior de Investigaciones Científicas [Madrid] (CSIC), Brown University, Laboratory of Molecular Biology, Medical Research Council, Departament de Genetica, Universitat de Barcelona (UB), Pennsylvania State University (Penn State), Penn State System-Penn State System, Department of Genetics, University of Georgia [USA], Uppsala University, Department of Ecology and Evolution [Lausanne], Université de Lausanne (UNIL), McMaster University, School of Biology, IE University, Università degli Studi di Bari Aldo Moro, University of Melbourne, Stanford University, University of California [Davis] (UC Davis), Boston University [Boston] (BU), Centro de Regulación Genómica (CRG), Universitat Pompeu Fabra [Barcelona] (UPF), Washington University in Saint Louis (WUSTL), University of Sheffield, Syracuse University, Universidade Federal Rural do Rio de Janeiro (UFRRJ), Department of Bioengineering, Beihang University (BUAA), Tucson Stock Center, Genome Center, University of California-University of California, Genome Sequencing Center, University of Washington School of Medicine, University of Winnipeg, Iowa State University (ISU), Indiana University System, The Wellcome Trust Sanger Institute [Cambridge], Center for Bioinformatics and Computational Biology, University of Delaware [Newark], Illinois State University, University of Rochester [USA], United States Department of Agriculture (USDA), Arizona State University [Tempe] (ASU), Leipzig University, Universidade Federal do Rio Grande do Sul (UFRGS), Duke University, North Carolina State University [Raleigh] (NC State), University of North Carolina System (UNC)-University of North Carolina System (UNC), University of Connecticut (UCONN), Computer Science Département, Université Saint-Esprit de Kaslik (USEK), Mc Master University, Indiana University, Institute of Evolutionary Biology, University of Edinburgh, J. Craig Venter Institute [La Jolla, USA] (JCVI), University of Oxford [Oxford], Center for Biomolecular Science and Engineering, Unité de Recherche Génomique Info (URGI), Institut National de la Recherche Agronomique (INRA), and Zdobnov, Evgeny
- Subjects
melanogaster genome ,0106 biological sciences ,RNA, Untranslated ,[SDV]Life Sciences [q-bio] ,Genome, Insect ,RNA, Untranslated/genetics ,Genes, Insect ,01 natural sciences ,Genome ,Genome, Insect/ genetics ,Gene Order ,Genome, Mitochondrial/genetics ,Drosophila Proteins ,Phylogeny ,ddc:616 ,Genetics ,0303 health sciences ,Multidisciplinary ,biology ,Reproduction ,Genomics ,Multigene Family/genetics ,Reproduction/genetics ,DNA Transposable Elements/genetics ,Genes, Insect/ genetics ,Multigene Family ,dosage compensation ,Drosophila ,amino-acid substitution ,Drosophila Protein ,Drosophila Proteins/genetics ,Synteny/genetics ,fruit-fly ,010603 evolutionary biology ,Synteny ,Drosophila sechellia ,Evolution, Molecular ,03 medical and health sciences ,Phylogenetics ,Molecular evolution ,Codon/genetics ,[SDV.BV]Life Sciences [q-bio]/Vegetal Biology ,Animals ,adaptive protein evolution ,Codon ,030304 developmental biology ,Gene Order/genetics ,molecular evolution ,fungi ,Immunity ,synonymous codon usage ,Sequence Analysis, DNA ,Immunity/genetics ,biology.organism_classification ,Drosophila mojavensis ,Evolutionary biology ,Genome, Mitochondrial ,DNA Transposable Elements ,maximum-likelihood ,noncoding dna ,Drosophila/ classification/ genetics/immunology/metabolism ,Sequence Alignment ,natural-selection ,Drosophila yakuba - Abstract
Affiliations des auteurs : cf page 216 de l'article; International audience; Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
- Published
- 2007
- Full Text
- View/download PDF
44. A New Human Genome Sequence Paves the Way for Individualized Genomics
- Author
-
Jiaqi Huang, Marvin Frazier, Vineet Bafna, Brian P. Walenz, Jon Borman, Samuel Levy, Josep F. Abril, Yu-Hui Rogers, Aaron L. Halpern, Vikas Bansal, J. Craig Venter, Ewen F. Kirkness, Timothy B. Stockwell, Jeffrey R. MacDonald, Granger G. Sutton, Pauline C. Ng, John Gill, Karen Beeson, Karin A. Remington, Alexia Tsiamouri, Robert L. Strausberg, Nelson Axelrod, Lars Feuk, Yuan Lin, Mary Shago, Andy Wing Chun Pang, Dana A. Busam, Gennady Denisov, Saul A. Kravitz, Tina C McIntosh, Stephen W. Scherer, and Universitat de Barcelona
- Subjects
Male ,ADN ,Gene Dosage ,Genoma humà ,Genome ,0302 clinical medicine ,INDEL Mutation ,Homo (Human) ,Human Genome Project ,Chromosomes, Human ,Biology (General) ,In Situ Hybridization, Fluorescence ,Genetics ,Mammals ,0303 health sciences ,General Neuroscience ,Chromosome Mapping ,Genome project ,Genomics ,Middle Aged ,Pedigree ,Phenotype ,Synopsis ,General Agricultural and Biological Sciences ,Research Article ,Human ,Primates ,Genome evolution ,Genotype ,Bioinformatics ,QH301-705.5 ,Molecular Sequence Data ,Biology ,Polymorphism, Single Nucleotide ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,Bioinformàtica ,Gene density ,Humans ,Genome size ,030304 developmental biology ,Chromosomes, Human, Y ,General Immunology and Microbiology ,Human genome ,Base Sequence ,Genome, Human ,Reproducibility of Results ,Genetics and Genomics ,DNA ,Sequence Analysis, DNA ,Microarray Analysis ,Diploidy ,Genòmica ,Haplotypes ,030217 neurology & neurosurgery ,Reference genome - Abstract
Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information., Author Summary We have generated an independently assembled diploid human genomic DNA sequence from both chromosomes of a single individual (J. Craig Venter). Our approach, based on whole-genome shotgun sequencing and using enhanced genome assembly strategies and software, generated an assembled genome over half of which is represented in large diploid segments (>200 kilobases), enabling study of the diploid genome. Comparison with previous reference human genome sequences, which were composites comprising multiple humans, revealed that the majority of genomic alterations are the well-studied class of variants based on single nucleotides (SNPs). However, the results also reveal that lesser-studied genomic variants, insertions and deletions, while comprising a minority (22%) of genomic variation events, actually account for almost 74% of variant nucleotides. Inclusion of insertion and deletion genetic variation into our estimates of interchromosomal difference reveals that only 99.5% similarity exists between the two chromosomal copies of an individual and that genetic variation between two individuals is as much as five times higher than previously estimated. The existence of a well-characterized diploid human genome sequence provides a starting point for future individual genome comparisons and enables the emerging era of individualized genomic information., Comparison of the DNA sequence of an individual human from the reference sequence reveals a surprising amount of difference.
- Published
- 2007
45. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific
- Author
-
Jonathan A. Eisen, Luisa I. Falcón, Jeff Hoffman, Dongying Wu, Joseph E. Venter, Valeria Souza, Shannon J. Williamson, Karen Beeson, Kenneth H. Nealson, Yu-Hui Rogers, Aaron L. Halpern, Kelvin Li, Germán Bonilla-Rosso, Robert Friedman, Jason Freeman, Karin A. Remington, Clare Stewart, Michael Ferrari, Douglas B. Rusch, Holly Baden-Tillson, Shubha Sathyendranath, Marvin Frazier, Granger G. Sutton, T. Utterback, Eldredge Bermingham, Robert L. Strausberg, Karla B. Heidelberg, Luis E. Eguiarte, J. Craig Venter, Cynthia Andrews-Pfannkoch, Victor A. Gallardo, David M. Karl, Trevor Platt, Saul A. Kravitz, John F. Heidelberg, Giselle Tamayo-Castillo, Bao Duc Tran, Hamilton O. Smith, Joyce Thorpe, Shibu Yooseph, and Moran, Nancy A
- Subjects
Pelagibacter ubique ,Genome evolution ,Food Chain ,QH301-705.5 ,Oceans and Seas ,Genomics ,Medical and Health Sciences ,General Biochemistry, Genetics and Molecular Biology ,Species Specificity ,Phylogenetics ,Genetics ,Biology (General) ,Clade ,Comparative genomics ,General Immunology and Microbiology ,biology ,Agricultural and Veterinary Sciences ,Ecology ,General Neuroscience ,Human Genome ,Computational Biology ,Biological Sciences ,biology.organism_classification ,Plankton ,Phylogenetic diversity ,Metagenomics ,General Agricultural and Biological Sciences ,Water Microbiology ,Biotechnology ,Developmental Biology - Abstract
The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed "fragment recruitment," addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed "extreme assembly," made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.
- Published
- 2007
46. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families
- Author
-
Aaron L. Halpern, Shannon J. Williamson, Kannan Natarajan, Susan S. Taylor, Granger G. Sutton, Marcin P. Joachimiak, David Eisenberg, Robert Friedman, Christopher S. Miller, Gerard Manning, Shibu Yooseph, David A W Soergel, Christopher van Belle, Douglas B. Rusch, Karin A. Remington, Steven E. Brenner, Jack E. Dixon, Yufeng Zhai, Robert L. Strausberg, Karla B. Heidelberg, Vineet Bafna, Jonathan A. Eisen, Shaun W. Lee, Weizhong Li, Piotr Cieplak, John-Marc Chandonia, Huiying Li, Benjamin J. Raphael, J. Craig Venter, Susan T. Mashiyama, Lukasz Jaroszewski, Marvin Frazier, and Adam Godzik
- Subjects
Protein family ,QH301-705.5 ,Gene prediction ,Protein domain ,Sequence alignment ,Genomics ,Computational biology ,Biology ,Medical and Health Sciences ,General Biochemistry, Genetics and Molecular Biology ,Structural genomics ,03 medical and health sciences ,14. Life underwater ,Biology (General) ,030304 developmental biology ,Genetics ,0303 health sciences ,General Immunology and Microbiology ,Agricultural and Veterinary Sciences ,030306 microbiology ,Shotgun sequencing ,General Neuroscience ,Biological Sciences ,Metagenomics ,General Agricultural and Biological Sciences ,Developmental Biology - Abstract
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.
- Published
- 2007
47. Ancient noncoding elements conserved in the human genome
- Author
-
J. Craig Venter, Robert L. Strausberg, Nidhi Dandona, Byrappa Venkatesh, Alison P. Lee, Alice Tay, Aaron L. Halpern, Yong-Hwee E. Loh, Lakshmi D. Viswanathan, Justin Johnson, Ewen F. Kirkness, and Sydney Brenner
- Subjects
Molecular Sequence Data ,Biology ,Regulatory Sequences, Nucleic Acid ,Genome ,Conserved sequence ,Evolution, Molecular ,Intergenic region ,Molecular evolution ,biology.animal ,Animals ,Humans ,Conserved Sequence ,Zebrafish ,Whole genome sequencing ,Genetics ,Multidisciplinary ,Base Sequence ,Genome, Human ,Vertebrate ,Takifugu ,Enhancer Elements, Genetic ,Regulatory sequence ,Sharks ,Human genome ,DNA, Intergenic - Abstract
Cartilaginous fishes represent the living group of jawed vertebrates that diverged from the common ancestor of human and teleost fish lineages about 530 million years ago. We generated ~1.4× genome sequence coverage for a cartilaginous fish, the elephant shark ( Callorhinchus milii ), and compared this genome with the human genome to identify conserved noncoding elements (CNEs). The elephant shark sequence revealed twice as many CNEs as were identified by whole-genome comparisons between teleost fishes and human. The ancient vertebrate-specific CNEs in the elephant shark and human genomes are likely to play key regulatory roles in vertebrate gene expression.
- Published
- 2006
48. Correction for Goldberg et al., A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes
- Author
-
Granger G. Sutton, Yu-Hui Rogers, Aaron L. Halpern, Susanne M. D. Goldberg, Justin Johnson, Federico M. Lauro, Robert L. Strausberg, Saul A. Kravitz, Eli Venter, Luke J. Tallon, Dana A. Busam, Steve Ferriera, Torsten Thomas, Hoda Khouri, Marvin Frazier, Kelvin Li, Robert Friedman, Tamara Feldblyum, and J. Craig Venter
- Subjects
Multidisciplinary ,Microbial Genomes ,media_common.quotation_subject ,Pyrosequencing ,Correction ,Quality (business) ,Computational biology ,Biology ,Hybrid approach ,media_common - Published
- 2006
49. A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes
- Author
-
J. Craig Venter, Justin Johnson, Luke J. Tallon, Dana A. Busam, Marvin Frazier, Kelvin Li, Torsten Thomas, Steve Ferriera, Aaron L. Halpern, Yu-Hui Rogers, Robert L. Strausberg, Tamara Feldblyum, Hoda Khouri, Robert M. Friedman, Granger G. Sutton, Susanne M. D. Goldberg, Saul A. Kravitz, Eli Venter, and Federico M. Lauro
- Subjects
Cancer genome sequencing ,Sanger sequencing ,Genetics ,Multidisciplinary ,Massive parallel sequencing ,Shotgun sequencing ,Sequence assembly ,Computational Biology ,Genomics ,Computational biology ,Sequence Analysis, DNA ,Biology ,Biological Sciences ,symbols.namesake ,Contig Mapping ,Metagenomics ,Genes, Bacterial ,symbols ,ABI Solid Sequencing ,Genome, Bacterial ,Biotechnology - Abstract
Since its introduction a decade ago, whole-genome shotgun sequencing (WGS) has been the main approach for producing cost-effective and high-quality genome sequence data. Until now, the Sanger sequencing technology that has served as a platform for WGS has not been truly challenged by emerging technologies. The recent introduction of the pyrosequencing-based 454 sequencing platform (454 Life Sciences, Branford, CT) offers a very promising sequencing technology alternative for incorporation in WGS. In this study, we evaluated the utility and cost-effectiveness of a hybrid sequencing approach using 3730 xl Sanger data and 454 data to generate higher-quality lower-cost assemblies of microbial genomes compared to current Sanger sequencing strategies alone.
- Published
- 2006
50. EXPLORING THE OCEAN'S MICROBES: SEQUENCING THE SEVEN SEAS
- Author
-
Bao Tran, Jeff Hoffman, Valeria Souza, Brooke A. Dill, David M. Karl, Joyce Thorpe, Robert Friedman, Shubha Sathyendranath, German Bonilla, Vineet Bafna, Shaojie Zhang, Douglas B. Rusch, Shibu Yooseph, Joseph E. Venter, Kenneth H. Nealson, J. Craig Venter, Cyrus Foote, Jason Freemen, Holly Baden-Tillson, Eldredge Bermingham, Cindy Pfannkoch, Giselle Tamayo, Victor A. Gallardo, Shannon J. Williamson, Karen Beeson, Trevor Platt, Hamilton O. Smith, Yu-Hui Rogers, John F. Heidelberg, Karin A. Remington, Robert L. Strausberg, Clare Stewart, Jonathan A. Eisen, Luis E. Eguiarte, Aaron L. Halpern, Marvin Frazier, Luisa I. Falcón, Charles H. Howard, T. Utterback, Granger G. Sutton, Dongying Wu, and Karla B. Heidelberg
- Subjects
German ,Fishery ,Nova scotia ,Genomic research ,Mexico city ,language ,Biology ,Falcon ,Archaeology ,computer ,language.human_language ,computer.programming_language - Abstract
Marvin E. Frazier,Douglas B. Rusch, Aaron L. Halpern, Karla B. Heidelberg, Granger Sutton, Shannon Williamson, Shibu Yooseph, Dongying Wu, Jonathan A. Eisen, Jeff Hoffman, Charles H. Howard, Cyrus Foote, Brooke A. Dill, Karin Remington, Karen Beeson, Bao Tran, Hamilton Smith, Holly Baden-Tillson, Clare Stewart, Joyce Thorpe, Jason Freemen, Cindy Pfannkoch, Joseph E. Venter, John Heidelberg, Terry Utterback, Yu-Hui Rogers, Shaojie Zhang, Vineet Bafna, Luisa Falcon, Valeria Souza,German Bonilla, Luis E. Eguiarte , David M. Karl, Ken Nealson, Shubha Sathyendranath, Trevor Platt, Eldredge Bermingham, Victor Gallardo, Giselle Tamayo, Robert Friedman, Robert Strausberg, J. Craig Venter 1 J. Craig Venter Institute, Rockville, Maryland, United States Of America 2 The Institute For Genomic Research, Rockville, Maryland, United States Of America 3 Department of Computer Science, University of California San Diego 4 Instituto de Ecologia Dept. Ecologia Evolutiva, National Autonomous University of Mexico Mexico City, 04510 Distrito Federal, Mexico 5 University of Hawaii, Honolulu, United States of America 6 Dept. of Earth Sciences, University of Southern California, Los Angeles, California, United States of America 7 Dalhousie University, Halifax, Nova Scotia, Canada 8 Smithsonian Tropical Research Institute, Balboa, Ancon, Republic of Panama 9 University of Concepcion, Concepcion, Chile 10 University of Costa Rica, San Pedro, San Jose, Republic of Costa Rica
- Published
- 2006
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.