49 results on '"Victor Kunin"'
Search Results
2. Genome analysis of the anaerobic thermohalophilic bacterium Halothermothrix orenii.
- Author
-
Konstantinos Mavromatis, Natalia Ivanova, Iain Anderson, Athanasios Lykidis, Sean D Hooper, Hui Sun, Victor Kunin, Alla Lapidus, Philip Hugenholtz, Bharat Patel, and Nikos C Kyrpides
- Subjects
Medicine ,Science - Abstract
Halothermothirx orenii is a strictly anaerobic thermohalophilic bacterium isolated from sediment of a Tunisian salt lake. It belongs to the order Halanaerobiales in the phylum Firmicutes. The complete sequence revealed that the genome consists of one circular chromosome of 2578146 bps encoding 2451 predicted genes. This is the first genome sequence of an organism belonging to the Haloanaerobiales. Features of both Gram positive and Gram negative bacteria were identified with the presence of both a sporulating mechanism typical of Firmicutes and a characteristic Gram negative lipopolysaccharide being the most prominent. Protein sequence analyses and metabolic reconstruction reveal a unique combination of strategies for thermophilic and halophilic adaptation. H. orenii can serve as a model organism for the study of the evolution of the Gram negative phenotype as well as the adaptation under thermohalophilic conditions and the development of biotechnological applications under conditions that require high temperatures and high salt concentrations.
- Published
- 2009
- Full Text
- View/download PDF
3. An experimental metagenome data management and analysis system.
- Author
-
Victor M. Markowitz, Natalia Ivanova, Krishna Palaniappan, Ernest Szeto, Frank Korzeniewski, Athanasios Lykidis, Iain Anderson, Konstantinos Mavrommatis, Victor Kunin, Héctor García Martín, Inna Dubchak, Philip Hugenholtz, and Nikos Kyrpides
- Published
- 2006
- Full Text
- View/download PDF
4. CoGenT++: an extensive and extensible data environment for computational genomics.
- Author
-
Leon Goldovsky, Paul J. Janssen, Dag G. Ahrén, Benjamin Audit, Ildefonso Cases, Nikos Darzentas, Anton J. Enright, Núria López-Bigas, José M. Peregrín-Alvarez, Mike Smith 0001, Sophia Tsoka, Victor Kunin, and Christos A. Ouzounis
- Published
- 2005
- Full Text
- View/download PDF
5. The properties of protein family space depend on experimental design.
- Author
-
Victor Kunin, Sarah A. Teichmann, Martijn A. Huynen, and Christos A. Ouzounis
- Published
- 2005
- Full Text
- View/download PDF
6. GeneTRACE - Reconstruction of Gene Content of Ancestral Species.
- Author
-
Victor Kunin and Christos A. Ouzounis
- Published
- 2003
- Full Text
- View/download PDF
7. TreeQ-VISTA: an interactive tree visualization tool with functional annotation query capabilities.
- Author
-
Shengyin Gu, Iain Anderson, Victor Kunin, Michael J. Cipriano, Simon Minovitsky, Gunther H. Weber, Nina Amenta, Bernd Hamann, and Inna Dubchak
- Published
- 2007
- Full Text
- View/download PDF
8. MagicMatch - cross-referencing sequence identifiers across databases.
- Author
-
Mike Smith 0001, Victor Kunin, Leon Goldovsky, Anton J. Enright, and Christos A. Ouzounis
- Published
- 2005
- Full Text
- View/download PDF
9. COmplete GENome Tracking (COGENT): A Flexible Data Environment for Computational Genomics.
- Author
-
Paul J. Janssen, Anton J. Enright, Benjamin Audit, Ildefonso Cases, Leon Goldovsky, Nicola Harte, Victor Kunin, and Christos A. Ouzounis
- Published
- 2003
- Full Text
- View/download PDF
10. Clustering the annotation space of proteins.
- Author
-
Victor Kunin and Christos A. Ouzounis
- Published
- 2005
- Full Text
- View/download PDF
11. Effects of OTU Clustering and PCR Artifacts on Microbial Diversity Estimates
- Author
-
Ulrika Lidstrom, Victor Kunin, Matthew Ashby, and Nastassia V. Patin
- Subjects
DNA, Bacterial ,Ecology ,Library ,Microbial diversity ,High-Throughput Nucleotide Sequencing ,Soil Science ,Biodiversity ,Computational biology ,Biology ,Bioinformatics ,Polymerase Chain Reaction ,law.invention ,Data sequences ,law ,RNA, Ribosomal, 16S ,Nature Conservation ,Actinomycetales ,Cluster analysis ,Ecology, Evolution, Behavior and Systematics ,Polymerase chain reaction ,DNA Primers - Abstract
Next-generation sequencing has increased the coverage of microbial diversity surveys by orders of magnitude, but differentiating artifacts from rare environmental sequences remains a challenge. Clustering 16S rRNA sequences into operational taxonomic units (OTUs) organizes sequence data into groups of 97 % identity, helping to reduce data volumes and avoid analyzing sequencing artifacts by grouping them with real sequences. Here, we analyze sequence abundance distributions across environmental samples and show that 16S rRNA sequences of >99 % identity can represent functionally distinct microorganisms, rendering OTU clustering problematic when the goal is an accurate analysis of organism distribution. Strict postsequencing quality control (QC) filters eliminated the most prevalent artifacts without clustering. Further experiments proved that DNA polymerase errors in polymerase chain reaction (PCR) generate a significant number of substitution errors, most of which pass QC filters. Based on our findings, we recommend minimizing the number of PCR cycles in DNA library preparation and applying strict postsequencing QC filters to reduce the most prevalent artifacts while maintaining a high level of accuracy in diversity estimates. We further recommend correlating rare and abundant sequences across environmental samples, rather than clustering into OTUs, to identify remaining sequence artifacts without losing the resolution afforded by high-throughput sequencing.
- Published
- 2012
- Full Text
- View/download PDF
12. Metatranscriptomic array analysis of ‘Candidatus Accumulibacter phosphatis’-enriched enhanced biological phosphorus removal sludge
- Author
-
Matthew Haynes, Victor Kunin, Philip Hugenholtz, Katherine D. McMahon, Natalia Ivanova, Shaomei He, Forest Rohwer, and Hector Garcia Martin
- Subjects
Citric acid cycle ,Polyphosphate kinase ,Enhanced biological phosphorus removal ,Biochemistry ,biology ,Gene expression ,Candidatus Accumulibacter ,biology.organism_classification ,Microbiology ,Anaerobic exercise ,Candidatus Accumulibacter phosphatis ,Gene ,Ecology, Evolution, Behavior and Systematics - Abstract
Here we report the first metatranscriptomic analysis of gene expression and regulation of 'Candidatus Accumulibacter'-enriched lab-scale sludge during enhanced biological phosphorus removal (EBPR). Medium density oligonucleotide microarrays were generated with probes targeting most predicted genes hypothesized to be important for the EBPR phenotype. RNA samples were collected at the early stage of anaerobic and aerobic phases (15 min after acetate addition and switching to aeration respectively). We detected the expression of a number of genes involved in the carbon and phosphate metabolisms, as proposed by EBPR models (e.g. polyhydroxyalkanoate synthesis, a split TCA cycle through methylmalonyl-CoA pathway, and polyphosphate formation), as well as novel genes discovered through metagenomic analysis. The comparison between the early stage anaerobic and aerobic gene expression profiles showed that expression levels of most genes were not significantly different between the two stages. The majority of upregulated genes in the aerobic sample are predicted to encode functions such as transcription, translation and protein translocation, reflecting the rapid growth phase of Accumulibacter shortly after being switched to aerobic conditions. Components of the TCA cycle and machinery involved in ATP synthesis were also upregulated during the early aerobic phase. These findings support the predictions of EBPR metabolic models that the oxidation of intracellularly stored carbon polymers through the TCA cycle provides ATP for cell growth when oxygen becomes available. Nitrous oxide reductase was among the very few Accumulibacter genes upregulated in the anaerobic sample, suggesting that its expression is likely induced by the deprivation of oxygen.
- Published
- 2010
- Full Text
- View/download PDF
13. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates
- Author
-
Howard Ochman, Philip Hugenholtz, Anna Engelbrektson, and Victor Kunin
- Subjects
DNA, Bacterial ,Sanger sequencing ,Genetics ,Rare biosphere ,Biodiversity ,Genetic Variation ,Biosphere ,Sequence Analysis, DNA ,Biology ,Microbiology ,Orders of magnitude (bit rate) ,symbols.namesake ,Genes, Bacterial ,Evolutionary biology ,RNA, Ribosomal, 16S ,Escherichia coli ,symbols ,Cluster Analysis ,Pyrosequencing ,Spurious relationship ,Cluster analysis ,Sequence Alignment ,Ecology, Evolution, Behavior and Systematics - Abstract
Massively parallel pyrosequencing of the small subunit (16S) ribosomal RNA gene has revealed that the extent of rare microbial populations in several environments, the 'rare biosphere', is orders of magnitude higher than previously thought. One important caveat with this method is that sequencing error could artificially inflate diversity estimates. Although the per-base error of 16S rDNA amplicon pyrosequencing has been shown to be as good as or lower than Sanger sequencing, no direct assessments of pyrosequencing errors on diversity estimates have been reported. Using only Escherichia coli MG1655 as a reference template, we find that 16S rDNA diversity is grossly overestimated unless relatively stringent read quality filtering and low clustering thresholds are applied. In particular, the common practice of removing reads with unresolved bases and anomalous read lengths is insufficient to ensure accurate estimates of microbial diversity. Furthermore, common and reproducible homopolymer length errors can result in relatively abundant spurious phylotypes further confounding data interpretation. We suggest that stringent quality-based trimming of 16S pyrotags and clustering thresholds no greater than 97% identity should be used to avoid overestimates of the rare biosphere.
- Published
- 2010
- Full Text
- View/download PDF
14. Genomic Analysis of ' Elusimicrobium minutum ,' the First Cultivated Representative of the Phylum ' Elusimicrobia ' (Formerly Termite Group 1)
- Author
-
Daniel P. R. Herlemann, Wakako Ikeda-Ohtsubo, Andreas Brune, Victor Kunin, Philip Hugenholtz, Alla Lapidus, Oliver Geissinger, and Hui Sun
- Subjects
DNA, Bacterial ,Pilus assembly ,Molecular Sequence Data ,Isoptera ,Biology ,Applied Microbiology and Biotechnology ,Genome ,Nonribosomal peptide ,Gene Order ,Elusimicrobium minutum ,Invertebrate Microbiology ,Animals ,Gene ,Phylogeny ,Genetics ,chemistry.chemical_classification ,Bacteria ,Ecology ,Phylum ,Sequence Analysis, DNA ,Ribosomal RNA ,biology.organism_classification ,Gastrointestinal Tract ,Biochemistry ,chemistry ,Antibiotic transport ,Genome, Bacterial ,Food Science ,Biotechnology - Abstract
Organisms of the candidate phylum termite group 1 (TG1) are regularly encountered in termite hindguts but are present also in many other habitats. Here, we report the complete genome sequence (1.64 Mbp) of “ Elusimicrobium minutum ” strain Pei191 T , the first cultured representative of the TG1 phylum. We reconstructed the metabolism of this strictly anaerobic bacterium isolated from a beetle larva gut, and we discuss the findings in light of physiological data. E. minutum has all genes required for uptake and fermentation of sugars via the Embden-Meyerhof pathway, including several hydrogenases, and an unusual peptide degradation pathway comprising transamination reactions and leading to the formation of alanine, which is excreted in substantial amounts. The presence of genes encoding lipopolysaccharide biosynthesis and the presence of a pathway for peptidoglycan formation are consistent with ultrastructural evidence of a gram-negative cell envelope. Even though electron micrographs showed no cell appendages, the genome encodes many genes putatively involved in pilus assembly. We assigned some to a type II secretion system, but the function of 60 pilE -like genes remains unknown. Numerous genes with hypothetical functions, e.g., polyketide synthesis, nonribosomal peptide synthesis, antibiotic transport, and oxygen stress protection, indicate the presence of hitherto undiscovered physiological traits. Comparative analysis of 22 concatenated single-copy marker genes corroborated the status of “ Elusimicrobia ” (formerly TG1) as a separate phylum in the bacterial domain, which was so far based only on 16S rRNA sequence analysis.
- Published
- 2009
- Full Text
- View/download PDF
15. CRISPR — a widespread system that provides acquired resistance against phages in bacteria and archaea
- Author
-
Victor Kunin, Philip Hugenholtz, and Rotem Sorek
- Subjects
Trans-activating crRNA ,Genetics ,CRISPR interference ,Bacteria ,General Immunology and Microbiology ,biology ,biology.organism_classification ,Archaea ,Microbiology ,Genome ,CRISPR Spacers ,Interspersed Repetitive Sequences ,Infectious Diseases ,Bacterial Proteins ,Genome, Archaeal ,Multigene Family ,Viral Interference ,CRISPR Loci ,Direct repeat ,CRISPR ,Bacteriophages ,DNA, Intergenic ,Gene Silencing ,Genome, Bacterial - Abstract
Arrays of clustered, regularly interspaced short palindromic repeats (CRISPRs) are widespread in the genomes of many bacteria and almost all archaea. These arrays are composed of direct repeats that are separated by similarly sized non-repetitive spacers. CRISPR arrays, together with a group of associated proteins, confer resistance to phages, possibly by an RNA-interference-like mechanism. This Progress discusses the structure and function of this newly recognized antiviral mechanism.
- Published
- 2008
- Full Text
- View/download PDF
16. Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite
- Author
-
Kerrie Barry, Catalina Patricia Morales Murillo, Rotem Sorek, Dan E. Robertson, Xinning Zhang, Edward Kirton, Peter Luginbuhl, Jared R. Leadbetter, Brian D. Green, Nikos C. Kyrpides, Hector Garcia Martin, Daniel Dalevi, Cathy Chang, Luis G. Acosta, Mircea Podar, Justin T. Stege, Philip Hugenholtz, Giselle Tamayo, Gordana Djordjevic, Isidore Rigoutsos, Victor Kunin, Natalia Ivanova, Nahla Aboushadi, Susannah G. Tringe, Eric G. Matson, Michelle H. Cayouette, Eric J. Mathur, Edward M. Rubin, Majid Ghassemian, Myriam Hernandez, Natalia Mikhailova, Asaf Salamov, Falk Warnecke, Alice C. McHardy, Darren Platt, Ernest Szeto, Toby Richardson, Elizabeth A. Ottesen, and Julita Madejska
- Subjects
Costa Rica ,Glycoside Hydrolases ,Bioelectric Energy Sources ,Molecular Sequence Data ,Microbial metabolism ,Isoptera ,Cellulase ,Lignin ,Models, Biological ,Polymerase Chain Reaction ,chemistry.chemical_compound ,Catalytic Domain ,Botany ,Nasutitermes ,Animals ,Cellulose ,Symbiosis ,Multidisciplinary ,Bacteria ,biology ,Hydrolysis ,Genomics ,biology.organism_classification ,Wood ,Xylan ,Carbon ,Intestines ,Fibrobacteres ,chemistry ,Genes, Bacterial ,Acetogenesis ,biology.protein ,Xylans ,Genome, Bacterial ,Symbiotic bacteria - Abstract
From the standpoints of both basic research and biotechnology, there is considerable interest in reaching a clearer understanding of the diversity of biological mechanisms employed during lignocellulose degradation. Globally, termites are an extremely successful group of wood-degrading organisms and are therefore important both for their roles in carbon turnover in the environment and as potential sources of biochemical catalysts for efforts aimed at converting wood into biofuels. Only recently have data supported any direct role for the symbiotic bacteria in the gut of the termite in cellulose and xylan hydrolysis. Here we use a metagenomic analysis of the bacterial community resident in the hindgut paunch of a wood-feeding 'higher' Nasutitermes species (which do not contain cellulose-fermenting protozoa) to show the presence of a large, diverse set of bacterial genes for cellulose and xylan hydrolysis. Many of these genes were expressed in vivo or had cellulase activity in vitro, and further analyses implicate spirochete and fibrobacter species in gut lignocellulose degradation. New insights into other important symbiotic functions including H2 metabolism, CO2-reductive acetogenesis and N2 fixation are also provided by this first system-wide gene analysis of a microbial community specialized towards plant lignocellulose degradation. Our results underscore how complex even a 1-microl environment can be.
- Published
- 2007
- Full Text
- View/download PDF
17. METAGENOMIC ARRAY ANALYSIS OF AN ENHANCED BIOLOGICAL PHOSPHORUS REMOVAL SLUDGE ENRICHED WITH ACCUMULIBACTER
- Author
-
Hector Garcia Martin, Shaomei He, Forest Rohwer, Victor Kunin, Matthew Haynes, Phil Hugenholtz, Katherine D. McMahon, and Natalia Ivanova
- Subjects
Enhanced biological phosphorus removal ,Metagenomics ,Chemistry ,Environmental chemistry ,General Engineering - Published
- 2007
- Full Text
- View/download PDF
18. PHOSPHORUS ACCUMULATING ORGANISMS REVEAL THEIR SECRETS: A GENOME LEVEL UNDERSTANDING OF ENHANCED BIOLOGICAL PHOSPHORUS REMOVAL
- Author
-
Victor Kunin, Philip Hugenholtz, Shaomei He, Jason J. Flowers, Katherine D. McMahon, and Hector Garcia Martin
- Subjects
Enhanced biological phosphorus removal ,chemistry ,Environmental chemistry ,Phosphorus ,General Engineering ,chemistry.chemical_element ,Genome - Published
- 2007
- Full Text
- View/download PDF
19. Genetic Blueprints for Enhanced Biological Phosphorus Removal (EBPR) Based on Environmental Shotgun Sequencing
- Author
-
Natalia Ivanova, Hector Garcia Martin, Katherine D. McMahon, Victor Kunin, Philip Hugenholtz, and Linda L. Blackall
- Subjects
Enhanced biological phosphorus removal ,Shotgun sequencing ,Chemistry ,General Engineering ,Computational biology - Published
- 2006
- Full Text
- View/download PDF
20. A minimal estimate for the gene content of the last universal common ancestor—exobiology from a terrestrial perspective
- Author
-
Victor Kunin, Christos A. Ouzounis, Leon Goldovsky, and Nikos Darzentas
- Subjects
Ancestral reconstruction ,Genetics ,Genome ,Gene Transfer, Horizontal ,Phylogenetic tree ,Earth, Planet ,Last universal ancestor ,General Medicine ,Biology ,Microbiology ,Evolution, Molecular ,Phylogenetics ,Evolutionary biology ,Three-domain system ,Exobiology ,Gene family ,Molecular Biology ,Gene ,Algorithms ,Phylogeny - Abstract
Using an algorithm for ancestral state inference of gene content, given a large number of extant genome sequences and a phylogenetic tree, we aim to reconstruct the gene content of the last universal common ancestor (LUCA), a hypothetical life form that presumably was the progenitor of the three domains of life. The method allows for gene loss, previously found to be a major factor in shaping gene content, and thus the estimate of LUCA's gene content appears to be substantially higher than that proposed previously, with a typical number of over 1000 gene families, of which more than 90% are also functionally characterized. More precisely, when only prokaryotes are considered, the number varies between 1006 and 1189 gene families while when eukaryotes are also included, this number increases to between 1344 and 1529 families depending on the underlying phylogenetic tree. Therefore, the common belief that the hypothetical genome of LUCA should resemble those of the smallest extant genomes of obligate parasites is not supported by recent advances in computational genomics. Instead, a fairly complex genome similar to those of free-living prokaryotes, with a variety of functional capabilities including metabolic transformation, information processing, membrane/transport proteins and complex regulation, shared between the three domains of life, emerges as the most likely progenitor of life on Earth, with profound repercussions for planetary exploration and exobiology.
- Published
- 2006
- Full Text
- View/download PDF
21. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes
- Author
-
Dag Ahrén, Christos A. Ouzounis, Nikos Darzentas, Pallavi Kaipa, Victor Kunin, Nuria Lopez-Bigas, Leon Goldovsky, Caroline Moore-Kochlacs, Peter D. Karp, and Sophia Tsoka
- Subjects
Genome ,Database ,Modelling biological systems ,MetaCyc ,Computational Biology ,Biological database ,Metabolic network ,Genomics ,Biology ,Proteomics ,computer.software_genre ,Article ,Metabolism ,Genome, Archaeal ,Databases, Genetic ,Genetics ,Animals ,Humans ,computer ,Genome, Bacterial ,BioCyc database collection - Abstract
The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from multiple organisms. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms. This paper discusses the computational methodology by which the BioCyc collection has been expanded, and presents an aggregate analysis of the collection that includes the range of number of pathways present in these organisms, and the most frequently observed pathways. We seek scientists to adopt and curate individual PGDBs within the BioCyc collection. Only by harnessing the expertise of many scientists we can hope to produce biological databases, which accurately reflect the depth and breadth of knowledge that the biomedical research community is producing.
- Published
- 2005
- Full Text
- View/download PDF
22. Measuring genome conservation across taxa: divided strains and united kingdoms
- Author
-
Victor Kunin, Leon Goldovsky, Dag Ahrén, Christos A. Ouzounis, and Paul Janssen
- Subjects
Comparative genomics ,Genetics ,Bacteria ,Phylogenetic tree ,Computational Biology ,Genomics ,Biology ,Genome ,Article ,Evolution, Molecular ,Taxon ,Phylogenetics ,Evolutionary biology ,Proteobacteria ,Taxonomy (biology) ,Taxonomic rank ,Gene ,Genome, Bacterial ,Phylogeny - Abstract
Species evolutionary relationships have traditionally been defined by sequence similarities of phylogenetic marker molecules, recently followed by whole-genome phylogenies based on gene order, average ortholog similarity or gene content. Here, we introduce genome conservation--a novel metric of evolutionary distances between species that simultaneously takes into account, both gene content and sequence similarity at the whole-genome level. Genome conservation represents a robust distance measure, as demonstrated by accurate phylogenetic reconstructions. The genome conservation matrix for all presently sequenced organisms exhibits a remarkable ability to define evolutionary relationships across all taxonomic ranges. An assessment of taxonomic ranks with genome conservation shows that certain ranks are inadequately described and raises the possibility for a more precise and quantitative taxonomy in the future. All phylogenetic reconstructions are available at the genome phylogeny server:http://maine.ebi.ac.uk:8000/cgi-bin/gps/GPS.pl.
- Published
- 2005
- Full Text
- View/download PDF
23. The Balance of Driving Forces During Genome Evolution in Prokaryotes
- Author
-
Victor Kunin and Christos A. Ouzounis
- Subjects
Genome evolution ,Gene Transfer, Horizontal ,Protein family ,Archaeal Proteins ,Biology ,Genome ,Genes, Archaeal ,Evolution, Molecular ,Bacterial Proteins ,Species Specificity ,Genome, Archaeal ,Gene duplication ,Genetics ,Letters ,Databases, Protein ,Genome size ,Gene ,Phylogeny ,Genetics (clinical) ,Helicobacter pylori ,Models, Genetic ,Phylogenetic tree ,Gene Amplification ,Computational Biology ,Prokaryotic Cells ,Genes, Bacterial ,Horizontal gene transfer ,Gene Deletion ,Genome, Bacterial - Abstract
Genomes are shaped by evolutionary processes such as gene genesis, horizontal gene transfer (HGT), and gene loss. To quantify the relative contributions of these processes, we analyze the distribution of 12,762 protein families on a phylogenetic tree, derived from entire genomes of 41 Bacteria and 10 Archaea. We show that gene loss is the most important factor in shaping genome content, being up to three times more frequent than HGT, followed by gene genesis, which may contribute up to twice as many genes as HGT. We suggest that gene gain and gene loss in prokaryotes are balanced; thus, on average, prokaryotic genome size is kept constant. Despite the importance of HGT, our results indicate that the majority of protein families have only been transmitted by vertical inheritance. To test our method, we present a study of strain-specific genes of Helicobacter pylori, and demonstrate correct predictions of gene loss and HGT for at least 81% of validated cases. This approach indicates that it is possible to trace genome content history and quantify the factors that shape contemporary prokaryotic genomes.
- Published
- 2003
- Full Text
- View/download PDF
24. Consistency analysis of similarity between multiple alignments: prediction of protein function and fold structure from analysis of local sequence motifs11Edited by B. Holland
- Author
-
Shmuel Pietrokovski, Victor Kunin, Gila Lithwick, Einat Sitbon, and Bob Chan
- Subjects
chemistry.chemical_classification ,Genetics ,Rossmann fold ,Protein family ,Fold (geology) ,Computational biology ,Biology ,Amino acid ,Similarity (network science) ,chemistry ,Structural Biology ,TIM barrel ,Structural motif ,Molecular Biology ,Sequence (medicine) - Abstract
A new method to analyze the similarity between multiply aligned protein motifs (blocks) was developed. It identifies sets of consistently aligned blocks. These are found to be protein regions of similar function and structure that appear in different contexts. For example, the Rossmann fold ligand-binding region is found similar to TIM barrel and methylase regions, various protein families are predicted to have a TIM-barrel fold and the structural relation between the ClpP protease and crotonase folds is identified from their sequence. Besides identifying local structure features, sequence similarity across short sequence-regions (less than 20 amino acid regions) also predicts structure similarity of whole domains (folds) a few hundred amino acid reidues long. Most of these relations could not be identified by other advanced sequence-to-sequence or sequence-to-multiple alignments comparisons. We describe the method (termed CYRCA), present examples of our findings, and discuss their implications.
- Published
- 2001
- Full Text
- View/download PDF
25. [Untitled]
- Author
-
Victor Kunin
- Subjects
Genetics ,biology ,Ribozyme ,RNA ,RNA-dependent RNA polymerase ,General Medicine ,Non-coding RNA ,chemistry.chemical_compound ,chemistry ,Biochemistry ,Space and Planetary Science ,RNA editing ,RNA polymerase ,biology.protein ,RNA polymerase I ,Ecology, Evolution, Behavior and Systematics ,Ligase ribozyme - Abstract
What was the first living molecule – RNA or protein?This question embodies the major disagreement instudies on the origin of life. The fact that incontemporary cells RNA polymerase is a protein andpeptidyl transferase consists of RNA suggests theexistence of a mutual catalytic dependence betweenthese two kinds of biopolymers. I suggest that thisdependence is a `frozen accident', a remnant from thefirst living system. This system is proposed to be acombination of an RNA molecule capable of catalyzingamino acid polymerization and the resulting proteinfunctioning as an RNA-dependent RNA polymerase. Thespecificity of the protein synthesis is thought to beachieved by the composition of the surrounding mediumand the specificity of the RNA synthesis – by Watson– Crick base pairing. Despite its apparent simplicity,the system possesses a great potential to evolve intoa primitive ribosome and further to life, as it isseen today. This model provides a possible explanationfor the origin of the interaction between nucleicacids and protein. Based on the suggested system, Ipropose a new definition of life as a system ofnucleic acid and protein polymerases with a constantsupply of monomers, energy and protection.
- Published
- 2000
- Full Text
- View/download PDF
26. Genome coverage, literally speaking
- Author
-
Leon Goldovsky, Victor Kunin, Nikos Darzentas, Paul Janssen, and Christos A. Ouzounis
- Subjects
Genetics ,Genome ,Science and Society ,Sequence analysis ,MEDLINE ,Computational Biology ,Context (language use) ,Genomics ,Computational biology ,Genome project ,Biology ,Biochemistry ,Annotation ,Animals ,Humans ,Periodicals as Topic ,Sequence Analysis ,Molecular Biology ,Functional genomics ,Gene - Abstract
In late 2004, 200 complete genomes had been sequenced and made available to the research community. At the time of writing this viewpoint, that number had further risen to 221 and will have undoubtedly increased again before publication. These genomes, which represent a wide range of species from archaea to human, are a highly valuable knowledge resource for the scientific community. However, the sequencing of a full genome is just the first step in research; it must be followed by the functional characterization of genes and proteins. In this context, it is interesting to see how well represented these sequenced species are in terms of publications. We have thus obtained the number of abstracts published per species and normalized that count by the number of genes in that species to obtain a comparable measure for the number of publications per gene for all completed and published genomes. This simple measure highlights the current knowledge gap between various organisms and could further serve as a guideline for selecting genomes for sequencing projects, high‐throughput functional genomics and database annotation efforts. The 200 complete genome sequences published by December 2004 included 118 genera, 166 species and 34 additional strains for 21 species. This rate translates to a doubling time of available genome sequences of less than two years (Janssen et al , 2003a). And it remains steady: in 2003, an average of one complete genome was released per week; 47 genomes were made available in the first 44 weeks of 2004. This trend will accelerate further, as more than 1,000 genome projects are currently underway (Bernal et al , 2001). For the 221 genomes currently available, the total number of predicted proteins is 822,114, according to the COGENT database (Janssen et al , 2003b). One of the great challenges for computational and experimental genomics is …
- Published
- 2005
- Full Text
- View/download PDF
27. Defining the core Arabidopsis thaliana root microbiome
- Author
-
Robert C. Edgar, Derek S. Lundberg, Julien Tremblay, Sarah L. Lebeis, Ruth E. Ley, Susannah G. Tringe, Jase Gehring, Tijana Glavina del Rio, Scott Yourstone, Victor Kunin, Stephanie Malfatti, Philip Hugenholtz, Sur Herrera Paredes, Thilo Eickhorst, Anna Engelbrektson, and Jeffery L. Dangl
- Subjects
Genotype ,Arabidopsis ,Plant Roots ,Ribotyping ,Article ,Actinobacteria ,RNA, Ribosomal, 16S ,Botany ,Proteobacteria ,Endophytes ,Arabidopsis thaliana ,Symbiosis ,In Situ Hybridization, Fluorescence ,Soil Microbiology ,Rhizosphere ,Multidisciplinary ,biology ,Ecology ,Root microbiome ,Soil classification ,Sequence Analysis, DNA ,biology.organism_classification ,Metagenome ,Soil microbiology - Abstract
Sequencing of the Arabidopsis thaliana root microbiome shows that its composition is strongly influenced by location, inside or outside the root, and by soil type. The association between a land plant and the soil microbes of the root microbiome is important for the plant's well-being. A deeper understanding of these microbial communities will offer opportunities to control plant growth and susceptibility to pathogens, particularly in sustainable agricultural regimes. Two groups, working separately but developing best-practice protocols in parallel, have characterized the root microbiota of the model plant Arabidopis thaliana. Working on two continents and with five different soil types, they reach similar general conclusions. The bacterial communities in each root compartment — the rhizosphere immediately surrounding the root and the endophytic compartment within the root — are most strongly influenced by soil type, and to a lesser degree by host genotype. In natural soils, Arabidopsis plants are preferentially colonized by Actinobacteria, Proteobacteria, Bacteroidetes and Chloroflexi species. And — an important point for future work — Arabidopsis root selectivity for soil bacteria under controlled environmental conditions mimics that of plants grown in a natural environment. Land plants associate with a root microbiota distinct from the complex microbial community present in surrounding soil. The microbiota colonizing the rhizosphere (immediately surrounding the root) and the endophytic compartment (within the root) contribute to plant growth, productivity, carbon sequestration and phytoremediation1,2,3. Colonization of the root occurs despite a sophisticated plant immune system4,5, suggesting finely tuned discrimination of mutualists and commensals from pathogens. Genetic principles governing the derivation of host-specific endophyte communities from soil communities are poorly understood. Here we report the pyrosequencing of the bacterial 16S ribosomal RNA gene of more than 600 Arabidopsis thaliana plants to test the hypotheses that the root rhizosphere and endophytic compartment microbiota of plants grown under controlled conditions in natural soils are sufficiently dependent on the host to remain consistent across different soil types and developmental stages, and sufficiently dependent on host genotype to vary between inbred Arabidopsis accessions. We describe different bacterial communities in two geochemically distinct bulk soils and in rhizosphere and endophytic compartments prepared from roots grown in these soils. The communities in each compartment are strongly influenced by soil type. Endophytic compartments from both soils feature overlapping, low-complexity communities that are markedly enriched in Actinobacteria and specific families from other phyla, notably Proteobacteria. Some bacteria vary quantitatively between plants of different developmental stage and genotype. Our rigorous definition of an endophytic compartment microbiome should facilitate controlled dissection of plant–microbe interactions derived from complex soil communities.
- Published
- 2011
28. Metatranscriptomic array analysis of 'Candidatus Accumulibacter phosphatis'-enriched enhanced biological phosphorus removal sludge
- Author
-
Shaomei, He, Victor, Kunin, Matthew, Haynes, Hector Garcia, Martin, Natalia, Ivanova, Forest, Rohwer, Philip, Hugenholtz, and Katherine D, McMahon
- Subjects
RNA, Bacterial ,Biodegradation, Environmental ,Bacterial Proteins ,Gene Expression Regulation ,Sewage ,Gene Expression Profiling ,Betaproteobacteria ,Phosphorus ,Anaerobiosis ,Metagenomics ,Aerobiosis ,Oligonucleotide Array Sequence Analysis - Abstract
Here we report the first metatranscriptomic analysis of gene expression and regulation of 'Candidatus Accumulibacter'-enriched lab-scale sludge during enhanced biological phosphorus removal (EBPR). Medium density oligonucleotide microarrays were generated with probes targeting most predicted genes hypothesized to be important for the EBPR phenotype. RNA samples were collected at the early stage of anaerobic and aerobic phases (15 min after acetate addition and switching to aeration respectively). We detected the expression of a number of genes involved in the carbon and phosphate metabolisms, as proposed by EBPR models (e.g. polyhydroxyalkanoate synthesis, a split TCA cycle through methylmalonyl-CoA pathway, and polyphosphate formation), as well as novel genes discovered through metagenomic analysis. The comparison between the early stage anaerobic and aerobic gene expression profiles showed that expression levels of most genes were not significantly different between the two stages. The majority of upregulated genes in the aerobic sample are predicted to encode functions such as transcription, translation and protein translocation, reflecting the rapid growth phase of Accumulibacter shortly after being switched to aerobic conditions. Components of the TCA cycle and machinery involved in ATP synthesis were also upregulated during the early aerobic phase. These findings support the predictions of EBPR metabolic models that the oxidation of intracellularly stored carbon polymers through the TCA cycle provides ATP for cell growth when oxygen becomes available. Nitrous oxide reductase was among the very few Accumulibacter genes upregulated in the anaerobic sample, suggesting that its expression is likely induced by the deprivation of oxygen.
- Published
- 2010
29. A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea
- Author
-
Dongying Wu, Cheryl A. Kerfeld, Stefan Spring, Alex Copeland, Alla Lapidus, David Bruce, Elke Lang, Jonathan A. Eisen, Patrick S. G. Chain, Athanasios Lykidis, Susan Lucas, Natalia Ivanova, Iain Anderson, Jan Fang Cheng, Edward M. Rubin, Feng Chen, Victor Kunin, Brian J. Tindall, Matt Nolan, Rüdiger Pukall, Sean D. Hooper, Amrita Pati, Eileen Dalin, Adam Zemla, Martin Wu, Hans-Peter Klenk, Cliff Han, Patrik D'haeseleer, Mitchell Singer, Lynne Goodwin, Nikos C. Kyrpides, Philip Hugenholtz, Konstantinos Mavromatis, and Sabine Gronow
- Subjects
Models, Molecular ,Protein Structure ,General Science & Technology ,Molecular Sequence Data ,Genomics ,Computational biology ,Bacterial genome size ,Genome ,Databases ,Genetic ,Bacterial Proteins ,Models ,Phylogenetics ,Genome, Archaeal ,Databases, Genetic ,Genetics ,Primary nutritional groups ,Amino Acid Sequence ,rRNA ,Phylogeny ,Multidisciplinary ,Phylogenetic tree ,biology ,Bacteria ,Human Genome ,Bacterial ,Conserved signature indels ,Molecular ,Genes, rRNA ,Biodiversity ,biology.organism_classification ,Archaea ,Research Highlight ,Actins ,Protein Structure, Tertiary ,Genes ,Archaeal ,Infection ,Sequence Alignment ,Tertiary ,Genome, Bacterial ,Biotechnology - Abstract
Sequencing of genomes from many different bacterial and archaeal groups is broadening the picture of the prokaryotic pan-genome., A new initiative provides comparative genomicists with a more complete picture of genome diversity. Here we discuss the improved sampling strategy.
- Published
- 2009
30. A bioinformatician's guide to metagenomics
- Author
-
Victor Kunin, Philip Hugenholtz, Alla Lapidus, Konstantinos Mavromatis, and Alex Copeland
- Subjects
Test data generation ,Data management ,Population ,Information Storage and Retrieval ,Reviews ,Biology ,Bioinformatics ,Microbiology ,Annotation ,Databases, Genetic ,Environmental Microbiology ,education ,Molecular Biology ,education.field_of_study ,Genome ,business.industry ,Genome project ,Genomics ,Sequence Analysis, DNA ,Data science ,Metadata ,Infectious Diseases ,Workflow ,Metagenomics ,Database Management Systems ,business - Abstract
SUMMARY As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for the best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe the chain of decisions accompanying a metagenomic project from the viewpoint of the bioinformatic analysis step by step. We guide the reader through a standard workflow for a metagenomic project beginning with presequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries, and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic data sets in contrast to genome projects. Different types of data analyses particular to metagenomes are then presented, including binning, dominant population analysis, and gene-centric analysis. Finally, data management issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.
- Published
- 2008
31. Genome analysis of the anaerobic thermohalophilic bacterium Halothermothrix orenii
- Author
-
Victor Kunin, Bharat K. C. Patel, Athanasios Lykidis, Natalia Ivanova, Sean D. Hooper, Alla Lapidus, Hui Sun, Nikos C. Kyrpides, Philip Hugenholtz, Konstantinos Mavromatis, and Iain Anderson
- Subjects
Lipopolysaccharides ,Gram-negative bacteria ,Hot Temperature ,Firmicutes ,Science ,Evolutionary Biology/Bioinformatics ,Negibacteria ,Genome ,Microbiology ,Halanaerobiales ,Complete sequence ,Bacteria, Anaerobic ,Halogens ,Gram-Negative Bacteria ,Bacteria (microorganisms) ,Genetics ,Whole genome sequencing ,Multidisciplinary ,biology ,Thermophile ,060500 MICROBIOLOGY ,Halothermothrix orenii ,Genetics and Genomics/Bioinformatics ,biology.organism_classification ,Halobacteriales ,Posibacteria ,Medicine ,DNA, Circular ,Genetics and Genomics/Comparative Genomics ,Water Microbiology ,Bacteria ,Genome, Bacterial ,Research Article - Abstract
Halothermothirx orenii is a strictly anaerobic thermohalophilic bacterium isolated from sediment of a Tunisian salt lake. It belongs to the order Halanaerobiales in the phylum Firmicutes. The complete sequence revealed that the genome consists of one circular chromosome of 2578146 bps encoding 2451 predicted genes. This is the first genome sequence of an organism belonging to the Haloanaerobiales. Features of both Gram positive and Gram negative bacteria were identified with the presence of both a sporulating mechanism typical of Firmicutes and a characteristic Gram negative lipopolysaccharide being the most prominent. Protein sequence analyses and metabolic reconstruction reveal a unique combination of strategies for thermophilic and halophilic adaptation. H. orenii can serve as a model organism for the study of the evolution of the Gram negative phenotype as well as the adaptation under thermohalophilic conditions and the development of biotechnological applications under conditions that require high temperatures and high salt concentrations.
- Published
- 2008
32. A korarchaeal genome reveals insights into the evolution of the Archaea
- Author
-
Alla Lapidus, Nikos C. Kyrpides, Mircea Podar, Philip Hugenholtz, Paul G. Richardson, Eugene V. Koonin, Victor Kunin, Kira S. Makarova, Eugene Goltsman, David E. Graham, Kerrie Barry, Brian P. Hedlund, Lennart Randau, Gerhard Wanner, Iain Anderson, Martin Keller, Céline Brochier-Armanet, Yuri I. Wolf, James Elkins, and Karl O. Stetter
- Subjects
Multidisciplinary ,biology ,Shotgun sequencing ,Genetic Variation ,Genomics ,Korarchaeota ,Sequence Analysis, DNA ,Biological Sciences ,biology.organism_classification ,Genome ,Evolutionary biology ,Phylogenetics ,Crenarchaeota ,Genome, Archaeal ,Candidate division ,Phylogeny ,Archaea - Abstract
The candidate division Korarchaeota comprises a group of uncultivated microorganisms that, by their small subunit rRNA phylogeny, may have diverged early from the major archaeal phyla Crenarchaeota and Euryarchaeota . Here, we report the initial characterization of a member of the Korarchaeota with the proposed name, “ Candidatus Korarchaeum cryptofilum,” which exhibits an ultrathin filamentous morphology. To investigate possible ancestral relationships between deep-branching Korarchaeota and other phyla, we used whole-genome shotgun sequencing to construct a complete composite korarchaeal genome from enriched cells. The genome was assembled into a single contig 1.59 Mb in length with a G + C content of 49%. Of the 1,617 predicted protein-coding genes, 1,382 (85%) could be assigned to a revised set of archaeal Clusters of Orthologous Groups (COGs). The predicted gene functions suggest that the organism relies on a simple mode of peptide fermentation for carbon and energy and lacks the ability to synthesize de novo purines, CoA, and several other cofactors. Phylogenetic analyses based on conserved single genes and concatenated protein sequences positioned the korarchaeote as a deep archaeal lineage with an apparent affinity to the Crenarchaeota . However, the predicted gene content revealed that several conserved cellular systems, such as cell division, DNA replication, and tRNA maturation, resemble the counterparts in the Euryarchaeota . In light of the known composition of archaeal genomes, the Korarchaeota might have retained a set of cellular features that represents the ancestral archaeal form.
- Published
- 2008
33. Denoising inferred functional association networks obtained by gene fusion analysis
- Author
-
Anton J. Enright, Athanasios Tsaftaris, Shiri Freilich, Victor Kunin, Leon Goldovsky, Aliki Kapazoglou, Christos A. Ouzounis, Atanas Kamburov, Enright, Anton [0000-0002-6090-3100], and Apollo - University of Cambridge Repository
- Subjects
lcsh:QH426-470 ,lcsh:Biotechnology ,Gene regulatory network ,Arabidopsis ,Computational biology ,Biology ,Genome ,Fusion gene ,Set (abstract data type) ,03 medical and health sciences ,0302 clinical medicine ,Bacterial Proteins ,Phylogenetics ,lcsh:TP248.13-248.65 ,Genetic variation ,Genetics ,Gene Regulatory Networks ,Chlamydia ,Gene ,Phylogeny ,030304 developmental biology ,Plant Proteins ,0303 health sciences ,Genetic Variation ,Reproducibility of Results ,lcsh:Genetics ,DNA microarray ,Gene Fusion ,030217 neurology & neurosurgery ,Research Article ,Biotechnology ,Protein Binding - Abstract
Background Gene fusion detection – also known as the 'Rosetta Stone' method – involves the identification of fused composite genes in a set of reference genomes, which indicates potential interactions between its un-fused counterpart genes in query genomes. The precision of this method typically improves with an ever-increasing number of reference genomes. Results In order to explore the usefulness and scope of this approach for protein interaction prediction and generate a high-quality, non-redundant set of interacting pairs of proteins across a wide taxonomic range, we have exhaustively performed gene fusion analysis for 184 genomes using an efficient variant of a previously developed protocol. By analyzing interaction graphs and applying a threshold that limits the maximum number of possible interactions within the largest graph components, we show that we can reduce the number of implausible interactions due to the detection of promiscuous domains. With this generally applicable approach, we generate a robust set of over 2 million distinct and testable interactions encompassing 696,894 proteins in 184 species or strains, most of which have never been the subject of high-throughput experimental proteomics. We investigate the cumulative effect of increasing numbers of genomes on the fidelity and quantity of predictions, and show that, for large numbers of genomes, predictions do not become saturated but continue to grow linearly, for the majority of the species. We also examine the percentage of component (and composite) proteins with relation to the number of genes and further validate the functional categories that are highly represented in this robust set of detected genome-wide interactions. Conclusion We illustrate the phylogenetic and functional diversity of gene fusion events across genomes, and their usefulness for accurate prediction of protein interaction and function.
- Published
- 2007
34. The Korarchaeota: Archaeal orphans representing an ancestral lineage of life
- Author
-
Brian P. Hedlund, Victor Kunin, Alla Lapidus, Karl O. Stetter, Nikos C. Kyrpides, James Elkins, Paul G. Richardson, Phil Hugenholtz, David E. Graham, Iain Anderson, Kerrie Barry, Martin Keller, Gerhard Wanner, and Eugene Goltsman
- Subjects
biology ,Evolutionary biology ,Crenarchaeota ,Lineage (evolution) ,Horizontal gene transfer ,Ribosomal RNA ,Euryarchaeota ,biology.organism_classification ,Molecular biology ,Korarchaeota ,Horizontal gene transfer in evolution ,Archaea - Abstract
Based on conserved cellular properties, all life on Earth can be grouped into different phyla which belong to the primary domains Bacteria, Archaea, and Eukarya. However, tracing back their evolutionary relationships has been impeded by horizontal gene transfer and gene loss. Within the Archaea, the kingdoms Crenarchaeota and Euryarchaeota exhibit a profound divergence. In order to elucidate the evolution of these two major kingdoms, representatives of more deeply diverged lineages would be required. Based on their environmental small subunit ribosomal (ss RNA) sequences, the Korarchaeota had been originally suggested to have an ancestral relationship to all known Archaea although this assessment has been refuted. Here we describe the cultivation and initial characterization of the first member of the Korarchaeota, highly unusual, ultrathin filamentous cells about 0.16 {micro}m in diameter. A complete genome sequence obtained from enrichment cultures revealed an unprecedented combination of signature genes which were thought to be characteristic of either the Crenarchaeota, Euryarchaeota, or Eukarya. Cell division appears to be mediated through a FtsZ-dependent mechanism which is highly conserved throughout the Bacteria and Euryarchaeota. An rpb8 subunit of the DNA-dependent RNA polymerase was identified which is absent from other Archaea and has been described as a eukaryotic signaturemore » gene. In addition, the representative organism possesses a ribosome structure typical for members of the Crenarchaeota. Based on its gene complement, this lineage likely diverged near the separation of the two major kingdoms of Archaea. Further investigations of these unique organisms may shed additional light onto the evolution of extant life.« less
- Published
- 2007
- Full Text
- View/download PDF
35. Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities
- Author
-
Falk Warnecke, Nikos C. Kyrpides, Philip Hugenholtz, Victor Kunin, Nik Putnam, Ernest Szeto, Linda L. Blackall, Eileen Dalin, Alice C. McHardy, Jasmyn Pangilinan, Shaomei He, Katherine D. McMahon, Natalia Ivanova, Asaf Salamov, Hector Garcia Martin, Harris Shapiro, Christine Yeates, Isidore Rigoutsos, and Kerrie Barry
- Subjects
biology ,Sewage ,Ecology ,Biomedical Engineering ,Candidatus Accumulibacter ,Adaptation, Biological ,Betaproteobacteria ,Bioengineering ,Phosphorus ,biology.organism_classification ,Applied Microbiology and Biotechnology ,Candidatus Accumulibacter phosphatis ,Waste Disposal, Fluid ,Polyphosphate-accumulating organisms ,Enhanced biological phosphorus removal ,Metagenomics ,Molecular Medicine ,Evolutionary dynamics ,Organism ,Genome, Bacterial ,Biotechnology ,Waste disposal - Abstract
Enhanced biological phosphorus removal (EBPR) is one of the best-studied microbially mediated industrial processes because of its ecological and economic relevance. Despite this, it is not well understood at the metabolic level. Here we present a metagenomic analysis of two lab-scale EBPR sludges dominated by the uncultured bacterium, "Candidatus Accumulibacter phosphatis." The analysis sheds light on several controversies in EBPR metabolic models and provides hypotheses explaining the dominance of A. phosphatis in this habitat, its lifestyle outside EBPR and probable cultivation requirements. Comparison of the same species from different EBPR sludges highlights recent evolutionary dynamics in the A. phosphatis genome that could be linked to mechanisms for environmental adaptation. In spite of an apparent lack of phylogenetic overlap in the flanking communities of the two sludges studied, common functional themes were found, at least one of them complementary to the inferred metabolism of the dominant organism. The present study provides a much needed blueprint for a systems-level understanding of EBPR and illustrates that metagenomics enables detailed, often novel, insights into even well-studied biological systems.
- Published
- 2006
36. CoGenT++: an extensive and extensible data environment for computational genomics
- Author
-
Nikos Darzentas, José M. Peregrín-Alvarez, Benjamin Audit, Nuria Lopez-Bigas, Ildefonso Cases, Paul Janssen, Victor Kunin, Anton J. Enright, Mike L. Smith, Sophia Tsoka, Dag Ahrén, Christos A. Ouzounis, Leon Goldovsky, European Bioinformatics Institute [Hinxton] (EMBL-EBI), EMBL Heidelberg, Laboratory for Microbiology, Centre d'Etude de l'Energie Nucléaire (SCK-CEN), Institute of Agrobiotechnology, National Center for Research and Technology, Laboratoire de Physique de l'ENS Lyon (Phys-ENS), École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, Laboratoire Joliot Curie, École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS), Transcription Networks Group, National Center for Biotechnology, Sanger Institute, Wellcome Trust, École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), and École normale supérieure de Lyon (ENS de Lyon)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Statistics and Probability ,Ancestral reconstruction ,Information Storage and Retrieval ,Computational biology ,Biology ,computational genomics ,computer.software_genre ,Biochemistry ,Genome ,03 medical and health sciences ,Consistency (database systems) ,User-Computer Interface ,[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN] ,Databases, Genetic ,Computer Graphics ,Molecular Biology ,data integration ,030304 developmental biology ,0303 health sciences ,030302 biochemistry & molecular biology ,Computational genomics ,Chromosome Mapping ,Computational Biology ,Genome project ,Genomics ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Computer Science Applications ,Systems Integration ,Computational Mathematics ,Gene nomenclature ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Database Management Systems ,Data mining ,computer ,Functional genomics ,Sequence Analysis ,Software ,Data integration - Abstract
Motivation: CoGenT++ is a data environment for computational research in comparative and functional genomics, designed to address issues of consistency, reproducibility, scalability and accessibility. Description: CoGenT++ facilitates the re-distribution of all fully sequenced and published genomes, storing information about species, gene names and protein sequences. We describe our scalable implementation of ProXSim, a continually updated all-against-all similarity database, which stores pairwise relationships between all genome sequences. Based on these similarities, derived databases are generated for gene fusions---AllFuse, putative orthologs---OFAM, protein families---TRIBES, phylogenetic profiles---ProfUse and phylogenetic trees. Extensions based on the CoGenT++ environment include disease gene prediction, pattern discovery, automated domain detection, genome annotation and ancestral reconstruction. Conclusion: CoGenT++ provides a comprehensive environment for computational genomics, accessible primarily for large-scale analyses as well as manual browsing. Availability: The database and component downloads are accessible at http://cgg.ebi.ac.uk/cogentpp.html. Contact: ouzounis@ebi.ac.uk
- Published
- 2005
- Full Text
- View/download PDF
37. The net of life: reconstructing the microbial phylogenetic network
- Author
-
Christos A. Ouzounis, Nikos Darzentas, Leon Goldovsky, and Victor Kunin
- Subjects
Genome evolution ,Phylogenetic tree ,Bacteria ,Gene Transfer, Horizontal ,Models, Genetic ,Ecology ,Tree of life (biology) ,Inheritance (genetic algorithm) ,Computational Biology ,Phylogenetic network ,Biology ,Horizontal gene transfer in evolution ,Archaea ,Evolution, Molecular ,Tree (data structure) ,Phylogenetics ,Evolutionary biology ,Genetics ,Letters ,Genetics (clinical) ,Algorithms ,Genome, Bacterial ,Phylogeny - Abstract
It has previously been suggested that the phylogeny of microbial species might be better described as a network containing vertical and horizontal gene transfer (HGT) events. Yet, all phylogenetic reconstructions so far have presented microbial trees rather than networks. Here, we present a first attempt to reconstruct such an evolutionary network, which we term the “net of life.” We use available tree reconstruction methods to infer vertical inheritance, and use an ancestral state inference algorithm to map HGT events on the tree. We also describe a weighting scheme used to estimate the number of genes exchanged between pairs of organisms. We demonstrate that vertical inheritance constitutes the bulk of gene transfer on the tree of life. We term the bulk of horizontal gene flow between tree nodes as “vines,” and demonstrate that multiple but mostly tiny vines interconnect the tree. Our results strongly suggest that the HGT network is a scale-free graph, a finding with important implications for genome evolution. We propose that genes might propagate extremely rapidly across microbial species through the HGT network, using certain organisms as hubs.
- Published
- 2005
38. Functional evolution of the yeast protein interaction network
- Author
-
Victor Kunin, José B. Pereira-Leal, and Christos A. Ouzounis
- Subjects
Genetics ,Proteomics ,Saccharomyces cerevisiae Proteins ,Phylogenetic tree ,biology ,Saccharomyces cerevisiae ,Computational Biology ,biology.organism_classification ,Yeast ,Protein–protein interaction ,Evolution, Molecular ,Functional evolution ,Interaction network ,Evolutionary biology ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Biological network ,Function (biology) - Abstract
Protein interactions are central to most biological processes. We investigated the dynamics of emergence of the protein interaction network of Saccharomyces cerevisiae by mapping origins of proteins on an evolutionary tree. We demonstrate that evolutionary periods are characterized by distinct connectivity levels of the emerging proteins. We found that the most-connected group of proteins dates to the eukaryotic radiation, and the more ancient group of pre-eukaryotic proteins is less connected. We show that functional classes have different average connectivity levels and that the time of emergence of these functional classes parallels the observed connectivity variation in evolution. We take these findings as evidence that the evolution of function might be the reason for the differences in connectivity throughout evolutionary time. We propose that the understanding of the mechanisms that generate the scale-free protein interaction network, and possibly other biological networks, requires consideration of protein function.
- Published
- 2004
39. Protein families and TRIBES in genome sequence space
- Author
-
Victor Kunin, Anton J. Enright, and Christos A. Ouzounis
- Subjects
Genetics ,Whole genome sequencing ,Genome ,Protein family ,Hypothetical protein ,Proteins ,Sequence alignment ,Computational biology ,Articles ,Biology ,Protein sequencing ,Sequence Analysis, Protein ,Cluster Analysis ,Protein function prediction ,Amino Acid Sequence ,Databases, Protein ,Genome size ,Sequence Alignment ,Algorithms ,Phylogeny - Abstract
Accurate detection of protein families allows assignment of protein function and the analysis of functional diversity in complete genomes. Recently, we presented a novel algorithm called TribeMCL for the detection of protein families that is both accurate and efficient. This method allows family analysis to be carried out on a very large scale. Using TribeMCL, we have generated a resource called TRIBES that contains protein family information, comprising annotations, protein sequence alignments and phylogenetic distributions describing 311 257 proteins from 83 completely sequenced genomes. The analysis of at least 60 934 detected protein families reveals that, with the essential families excluded, paralogy levels are similar between prokaryotes, irrespective of genome size. The number of essential families is estimated to be between 366 and 426. We also show that the currently known space of protein families is scale free and discuss the implications of this distribution. In addition, we show that smaller families are often formed by shorter proteins and discuss the reasons for this intriguing pattern. Finally, we analyse the functional diversity of protein families in entire genome sequences. The TRIBES protein family resource is accessible at http://www.ebi.ac.uk/ research/cgg/tribes/.
- Published
- 2003
40. Classification schemes for protein structure and function
- Author
-
Richard M.R. Coulson, Victor Kunin, Christos A. Ouzounis, José B. Pereira-Leal, and Anton J. Enright
- Subjects
Scheme (programming language) ,Protein structure and function ,Protein Folding ,Protein Conformation ,Classification scheme ,Biology ,Machine learning ,computer.software_genre ,Bioinformatics ,Structure-Activity Relationship ,Protein structure ,Genetics ,Animals ,Humans ,Molecular Biology ,Genetics (clinical) ,computer.programming_language ,business.industry ,Computational Biology ,Proteins ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial intelligence ,business ,computer ,Universe (mathematics) ,Forecasting - Abstract
We examine the structural and functional classifications of the protein universe, providing an overview of the existing classification schemes, their features and inter-relationships. We argue that a unified scheme should be based on a natural classification approach and that more comparative analyses of the present schemes are required both to understand their limitations and to help delimit the number of known protein folds and their corresponding functional roles in cells.
- Published
- 2003
41. Beyond 100 genomes
- Author
-
Paul, Janssen, Benjamin, Audit, Ildefonso, Cases, Nikos, Darzentas, Leon, Goldovsky, Victor, Kunin, Nuria, Lopez-Bigas, José Manuel, Peregrin-Alvarez, José B, Pereira-Leal, Sophia, Tsoka, and Christos A, Ouzounis
- Subjects
Genome ,Correspondence ,Animals ,Computational Biology ,Humans ,Proteins ,Phylogeny - Abstract
By the end of 2002, we witnessed the landmark submission of the 100th complete genome sequence in the databases. An overview of these genomes reveals certain interesting trends and provides valuable insights into possible future developments.
- Published
- 2003
42. Myriads of protein families, and still counting
- Author
-
Victor, Kunin, Ildefonso, Cases, Anton J, Enright, Victor, de Lorenzo, and Christos A, Ouzounis
- Subjects
Genome ,Correspondence ,Animals ,Humans ,Proteins ,Sequence Analysis, DNA - Abstract
From the historical record of genome sequencing, we show that the rate of discovery of new families has remained constant over time, indicating that our knowledge of sequence space is far from complete.
- Published
- 2003
43. Organization and functional analysis of the mouse transporter associated with antigen processing 2 promoter
- Author
-
Victor Kunin, Evgeny Arons, Chana Schechter, and Rachel Ehrlich
- Subjects
Proteasome Endopeptidase Complex ,Transcription, Genetic ,Immunology ,Antigen presentation ,Molecular Sequence Data ,Repressor ,Genes, MHC Class I ,Response Elements ,Cell Line ,Mice ,ATP Binding Cassette Transporter, Subfamily B, Member 3 ,Multienzyme Complexes ,Cricetinae ,MHC class I ,Immunology and Allergy ,Animals ,Humans ,Promoter Regions, Genetic ,Transcription factor ,Gene ,Antigen Presentation ,Binding Sites ,Gorilla gorilla ,biology ,Base Sequence ,Proteins ,Promoter ,Transporter associated with antigen processing ,Sequence Analysis, DNA ,Phosphoproteins ,Molecular biology ,Rats ,DNA-Binding Proteins ,Repressor Proteins ,Cysteine Endopeptidases ,Peptide transport ,biology.protein ,ATP-Binding Cassette Transporters ,Interferon Regulatory Factor-2 ,Interferon Regulatory Factor-1 ,Transcription Factors - Abstract
In accordance with the key role of MHC class I molecules in the adaptive immune response against viruses, they are expressed by most cells, and their expression can be enhanced by cytokines. The assembly and cell surface expression of class I complexes depend on a continuous peptide supply. The peptides are generated mainly by the proteasome and are transported to the endoplasmic reticulum by a peptide transport pump consisting of two subunits, TAP1 and TAP2. The proteasome low molecular weight polypeptide (2 and 7), as well as TAP (1 and 2) genes, are coordinately regulated and are induced by IFNs. Despite this coordinate regulation, examination of tumors shows that these genes can be discordantly down-regulated. In pursuing a molecular explanation for these observations, we have characterized the mouse TAP2 promoter region and 5′-flanking sequence. We show that the 5′ untranslated regions of TAP2 genes have a characteristic genomic organization that is conserved in both the mouse and the human. The mouse TAP2 promoter belongs to a class of promoters that lack TATA boxes but contain a MED1 (multiple start site element downstream) sequence. Accordingly, transcription is initiated from multiple sites within a 100-nucleotide window. An IFN regulatory factor 1 (IRF1)/IRF2 binding site is located in this region and is involved in both basal and IRF1-induced TAP2 promoter activity. The implication of the extensive differences found among the promoters of class I heavy chain, low molecular weight polypeptide, and TAP genes, all encoding proteins involved in Ag presentation, is discussed.
- Published
- 2001
44. Evolutionary conservation of sequence and secondary structures in CRISPR repeats
- Author
-
Victor Kunin, Philip Hugenholtz, and Rotem Sorek
- Subjects
Sequence analysis ,RNA, Archaeal ,Computational biology ,Biology ,Genome ,CRISPR Spacers ,Conserved sequence ,Evolution, Molecular ,Tandem repeat ,Cluster Analysis ,Direct repeat ,CRISPR ,Conserved Sequence ,Repetitive Sequences, Nucleic Acid ,Genetics ,Bacteria ,Base Sequence ,Sequence Analysis, RNA ,Research ,Life Sciences ,Archaea ,RNA, Bacterial ,Multigene Family ,CRISPR Loci ,Nucleic Acid Conformation ,Genome, Bacterial ,Software - Abstract
The categorisation and structural analysis of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) sequences from 195 microbial genomes show that repeats from diverse organisms can be grouped based on sequence similarity, and that some groups have pronounced secondary structures with compensatory base changes., Background Clustered regularly interspaced short palindromic repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in approximately 40% of bacterial and most archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CASs), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been recently shown that CRISPR provides acquired resistance against viruses in prokaryotes. Results Here we analyze CRISPR repeats identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. Some of the clusters present stable, highly conserved RNA secondary structures, while others lack detectable structures. Stable secondary structures exhibit multiple compensatory base changes in the stem region, indicating evolutionary and functional conservation. Conclusion We show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification, including specific relationships between CRISPR and CAS subtypes.
- Published
- 2007
- Full Text
- View/download PDF
45. Corrigendum
- Author
-
Nikos C. Kyrpides, Frank Korzeniewski, Konstantinos Mavrommatis, Athanasios Lykidis, Victor Kunin, Victor Markowitz, Inna Dubchak, Natalia Ivanova, Krishna Palaniappan, Hector Garcia Martin, Phil Hugenholtz, Iain Anderson, and Ernest Szeto
- Subjects
Statistics and Probability ,Computational Mathematics ,Computational Theory and Mathematics ,business.industry ,Philosophy ,Data management ,business ,Molecular Biology ,Biochemistry ,Spelling ,Classics ,Computer Science Applications - Abstract
An experimental metagenome data management and analysis systemVictor M. Markowitz, Natalia Ivanova, Krishna Palaniappan, Ernest Szeto, Frank Korzeniewski, Athanasios Lykidis, Iain Anderson,Konstantinos Mavrommatis, Victor Kunin, Hector Garcia Martin, Inna Dubchak, Phil Hugenholtz and Nikos C. KyrpidesBioinformatics (2006) 22(14), e359–e367The authors would like to apologize for an error in the spelling of one of the authors’ names. The correct spelling is Konstantinos Mavromatis.
- Published
- 2006
- Full Text
- View/download PDF
46. [Untitled]
- Author
-
Mark Gerstein, Paul M. Harrison, Yang Liu, and Victor Kunin
- Subjects
Genetics ,0303 health sciences ,Protein family ,030306 microbiology ,Pseudogene ,Genomics ,Biology ,Genome ,03 medical and health sciences ,Codon usage bias ,Gene duplication ,Horizontal gene transfer ,Gene ,030304 developmental biology - Abstract
Background: Pseudogenes often manifest themselves as disabled copies of known genes. In prokaryotes, it was generally believed (with a few well-known exceptions) that they were rare. Results: We have carried out a comprehensive analysis of the occurrence of pseudogenes in a diverse selection of 64 prokaryote genomes. Overall, we find a total of around 7,000 candidate pseudogenes. Moreover, in all the genomes surveyed, pseudogenes occur in at least 1 to 5% of all gene-like sequences, with some genomes having considerably higher occurrence. Although many large populations of pseudogenes arise from large, diverse protein families (for example, the ABC transporters), notable numbers of pseudogenes are associated with specific families that do not occur that widely. These include the cytochrome P450 and PPE families (PF00067 and PF00823) and others that have a direct role in DNA transposition. Conclusions: We find suggestive evidence that a large fraction of prokaryote pseudogenes arose from failed horizontal transfer events. In particular, we find that pseudogenes are more than twice as likely as genes to have anomalous codon usage associated with horizontal transfer. Moreover, we found a significant difference in the number of horizontally transferred pseudogenes in pathogenic and non-pathogenic strains of Escherichia coli.
- Published
- 2004
- Full Text
- View/download PDF
47. [Untitled]
- Author
-
Sophia Tsoka, Leon Goldovsky, Victor Kunin, Nikos Darzentas, Nuria Lopez-Bigas, José B. Pereira-Leal, Ildefonso Cases, José M. Peregrín-Alvarez, Paul Janssen, Christos A. Ouzounis, and Benjamin Audit
- Subjects
Comparative genomics ,Whole genome sequencing ,Evolutionary biology ,Genomics ,Genome project ,Biology ,Genome - Abstract
By the end of 2002, we witnessed the landmark submission of the 100th complete genome sequence in the databases. An overview of these genomes reveals certain interesting trends and provides valuable insights into possible future developments.
- Published
- 2003
- Full Text
- View/download PDF
48. [Untitled]
- Author
-
Victor Kunin, Víctor de Lorenzo, Anton J. Enright, Christos A. Ouzounis, and Ildefonso Cases
- Subjects
Comparative genomics ,Genetics ,Protein family ,Evolutionary biology ,Computational genomics ,Genomics ,Sequence space (evolution) ,Biology ,Genome ,DNA sequencing ,Personal genomics - Abstract
From the historical record of genome sequencing, we show that the rate of discovery of new families has remained constant over time, indicating that our knowledge of sequence space is far from complete.
- Published
- 2003
- Full Text
- View/download PDF
49. Analysis of mouse TAP2 gene transcription
- Author
-
Evgeny Arons, L. Marash, Rachel Ehrlich, and Victor Kunin
- Subjects
Sp3 transcription factor ,General transcription factor ,TAF2 ,Response element ,Promoter ,E-box ,TCF4 ,Biology ,Enhancer ,Biochemistry ,Cell biology - Published
- 2000
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.