Back to Search
Start Over
OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups
- Source :
- Nucleic Acids Research
- Publication Year :
- 2006
- Publisher :
- Oxford University Press (OUP), 2006.
-
Abstract
- The OrthoMCL database (http://orthomcl.cbil.upenn.edu) houses ortholog group predictions for 55 species, including 16 bacterial and 4 archaeal genomes representing phylogenetically diverse lineages, and most currently available complete eukaryotic genomes: 24 unikonts (12 animals, 9 fungi, microsporidium, Dictyostelium, Entamoeba), 4 plants/algae and 7 apicomplexan parasites. OrthoMCL software was used to cluster proteins based on sequence similarity, using an all-against-all BLAST search of each species' proteome, followed by normalization of inter-species differences, and Markov clustering. A total of 511,797 proteins (81.6% of the total dataset) were clustered into 70,388 ortholog groups. The ortholog database may be queried based on protein or group accession numbers, keyword descriptions or BLAST similarity. Ortholog groups exhibiting specific phyletic patterns may also be identified, using either a graphical interface or a text-based Phyletic Pattern Expression grammar. Information for ortholog groups includes the phyletic profile, the list of member proteins and a multiple sequence alignment, a statistical summary and graphical view of similarities, and a graphical representation of domain architecture. OrthoMCL software, the entire FASTA dataset employed and clustering results are available for download. OrthoMCL-DB provides a centralized warehouse for orthology prediction among multiple species, and will be updated and expanded as additional genome sequence data become available.
- Subjects :
- 0106 biological sciences
Proteome
Architecture domain
media_common.quotation_subject
Genomics
Computational biology
Biology
01 natural sciences
Genome
Article
User-Computer Interface
03 medical and health sciences
Phylogenetics
Genetics
Animals
Cluster Analysis
Databases, Protein
Phyletic gradualism
Phylogeny
030304 developmental biology
media_common
Whole genome sequencing
Internet
0303 health sciences
Multiple sequence alignment
Sequence Homology, Amino Acid
010606 plant biology & botany
Subjects
Details
- ISSN :
- 13624962 and 03051048
- Volume :
- 34
- Database :
- OpenAIRE
- Journal :
- Nucleic Acids Research
- Accession number :
- edsair.doi.dedup.....13b3bef47b1f03ea6a0dccb34c6d4e72
- Full Text :
- https://doi.org/10.1093/nar/gkj123