1. CoGenT++: an extensive and extensible data environment for computational genomics
- Author
-
Nikos Darzentas, José M. Peregrín-Alvarez, Benjamin Audit, Nuria Lopez-Bigas, Ildefonso Cases, Paul Janssen, Victor Kunin, Anton J. Enright, Mike L. Smith, Sophia Tsoka, Dag Ahrén, Christos A. Ouzounis, Leon Goldovsky, European Bioinformatics Institute [Hinxton] (EMBL-EBI), EMBL Heidelberg, Laboratory for Microbiology, Centre d'Etude de l'Energie Nucléaire (SCK-CEN), Institute of Agrobiotechnology, National Center for Research and Technology, Laboratoire de Physique de l'ENS Lyon (Phys-ENS), École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, Laboratoire Joliot Curie, École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS), Transcription Networks Group, National Center for Biotechnology, Sanger Institute, Wellcome Trust, École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), and École normale supérieure de Lyon (ENS de Lyon)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Statistics and Probability ,Ancestral reconstruction ,Information Storage and Retrieval ,Computational biology ,Biology ,computational genomics ,computer.software_genre ,Biochemistry ,Genome ,03 medical and health sciences ,Consistency (database systems) ,User-Computer Interface ,[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN] ,Databases, Genetic ,Computer Graphics ,Molecular Biology ,data integration ,030304 developmental biology ,0303 health sciences ,030302 biochemistry & molecular biology ,Computational genomics ,Chromosome Mapping ,Computational Biology ,Genome project ,Genomics ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Computer Science Applications ,Systems Integration ,Computational Mathematics ,Gene nomenclature ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Database Management Systems ,Data mining ,computer ,Functional genomics ,Sequence Analysis ,Software ,Data integration - Abstract
Motivation: CoGenT++ is a data environment for computational research in comparative and functional genomics, designed to address issues of consistency, reproducibility, scalability and accessibility. Description: CoGenT++ facilitates the re-distribution of all fully sequenced and published genomes, storing information about species, gene names and protein sequences. We describe our scalable implementation of ProXSim, a continually updated all-against-all similarity database, which stores pairwise relationships between all genome sequences. Based on these similarities, derived databases are generated for gene fusions---AllFuse, putative orthologs---OFAM, protein families---TRIBES, phylogenetic profiles---ProfUse and phylogenetic trees. Extensions based on the CoGenT++ environment include disease gene prediction, pattern discovery, automated domain detection, genome annotation and ancestral reconstruction. Conclusion: CoGenT++ provides a comprehensive environment for computational genomics, accessible primarily for large-scale analyses as well as manual browsing. Availability: The database and component downloads are accessible at http://cgg.ebi.ac.uk/cogentpp.html. Contact: ouzounis@ebi.ac.uk
- Published
- 2005
- Full Text
- View/download PDF