Back to Search
Start Over
Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees
- Source :
- PLoS ONE, Vol 6, Iss 3, p e18011 (2011), PLoS ONE, PloS one, vol 6, iss 3
- Publication Year :
- 2011
- Publisher :
- Public Library of Science (PLoS), 2011.
-
Abstract
- BackgroundMost of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species.Methodology/principal findingsWe designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences.Conclusions/significanceOf the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them.
- Subjects :
- Genome
Computer Applications
Databases, Genetic
Genome Evolution
Phylogeny
Genetics
Plant Growth and Development
0303 health sciences
Multidisciplinary
Phylogenetic tree
Ecology
Archaeal Evolution
Genomics
Phylogenetics
Multigene Family
Medicine
Algorithms
Biotechnology
Research Article
Archaeans
Sequence analysis
Evolution
General Science & Technology
Oceans and Seas
Science
Sequence alignment
Biology
Microbiology
DNA sequencing
Viral Evolution
Evolution, Molecular
03 medical and health sciences
Databases
Genetic
Bacterial Proteins
Virology
Evolutionary Systematics
14. Life underwater
030304 developmental biology
Ribosomal
Evolutionary Biology
Bacterial Evolution
Base Sequence
030306 microbiology
Molecular
Computational Biology
Genomic Evolution
Bacteriology
Comparative Genomics
rpoB
Organismal Evolution
Rec A Recombinases
Evolutionary biology
Metagenomics
RNA, Ribosomal
Evolutionary Ecology
Microbial Evolution
Computer Science
RNA
Environmental Protection
Developmental Biology
Subjects
Details
- Language :
- English
- ISSN :
- 19326203
- Volume :
- 6
- Issue :
- 3
- Database :
- OpenAIRE
- Journal :
- PLoS ONE
- Accession number :
- edsair.doi.dedup.....789084a8f62242c1bfd4796c1ab695d9