Back to Search
Start Over
Rapid similarity search of proteins using alignments of domain arrangements
- Source :
- Bioinformatics. 30:274-281
- Publication Year :
- 2013
- Publisher :
- Oxford University Press (OUP), 2013.
-
Abstract
- Motivation: Homology search methods are dominated by the central paradigm that sequence similarity is a proxy for common ancestry and, by extension, functional similarity. For determining sequence similarity in proteins, most widely used methods use models of sequence evolution and compare amino-acid strings in search for conserved linear stretches. Probabilistic models or sequence profiles capture the position-specific variation in an alignment of homologous sequences and can identify conserved motifs or domains. While profile-based search methods are generally more accurate than simple sequence comparison methods, they tend to be computationally more demanding. In recent years, several methods have emerged that perform protein similarity searches based on domain composition. However, few methods have considered the linear arrangements of domains when conducting similarity searches, despite strong evidence that domain order can harbour considerable functional and evolutionary signal. Results: Here, we introduce an alignment scheme that uses a classical dynamic programming approach to the global alignment of domains. We illustrate that representing proteins as strings of domains (domain arrangements) and comparing these strings globally allows for a both fast and sensitive homology search. Further, we demonstrate that the presented methods complement existing methods by finding similar proteins missed by popular amino-acid–based comparison methods. Availability: An implementation of the presented algorithms, a web-based interface as well as a command-line program for batch searching against the UniProt database can be found at http://rads.uni-muenster.de. Furthermore, we provide a JAVA API for programmatic access to domain-string–based search methods. Contact: terrapon.nicolas@gmail.com or ebb@uni-muenster.de Supplementary information: Supplementary data are available at Bioinformatics online.
- Subjects :
- Statistics and Probability
Theoretical computer science
Conserved Domain Database
Sequence analysis
Nearest neighbor search
Saccharomyces cerevisiae
computer.software_genre
Biochemistry
Software
Sequence Analysis, Protein
Protein methods
Protein Interaction Domains and Motifs
Molecular Biology
Mathematics
Models, Statistical
business.industry
GTPase-Activating Proteins
Probabilistic logic
Computational Biology
Proteins
Computer Science Applications
Dynamic programming
Computational Mathematics
ROC Curve
Computational Theory and Mathematics
Phosphotransferases (Phosphomutases)
Data mining
UniProt
business
Sequence Alignment
computer
Algorithms
Subjects
Details
- ISSN :
- 13674811 and 13674803
- Volume :
- 30
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....84aee09c00f1bf141a60a5d1df55685c
- Full Text :
- https://doi.org/10.1093/bioinformatics/btt379