Back to Search
Start Over
Encoded Expansion: An Efficient Algorithm to Discover Identical String Motifs.
- Source :
-
PLoS ONE . May2014, Vol. 9 Issue 5, p1-9. 9p. - Publication Year :
- 2014
-
Abstract
- A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the combinatorial approach exploits dynamic data structures such as trees or graphs. Recently (Karci (2009) Efficient automatic exact motif discovery algorithms for biological sequences, Expert Systems with Applications 36:7952–7963) devised a deterministic algorithm that finds all the identical copies of string motifs of all sizes in theoretical time complexity of and a space complexity of where is the length of the input sequence and is the length of the longest possible string motif. In this paper, we present a significant improvement on Karci's original algorithm. The algorithm that we propose reports all identical string motifs of sizes that occur at least times. Our algorithm starts with string motifs of size 2, and at each iteration it expands the candidate string motifs by one symbol throwing out those that occur less than times in the entire input sequence. We use a simple array and data encoding to achieve theoretical worst-case time complexity of and a space complexity of Encoding of the substrings can speed up the process of comparison between string motifs. Experimental results on random and real biological sequences confirm that our algorithm has indeed a linear time complexity and it is more scalable in terms of sequence length than the existing algorithms. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 19326203
- Volume :
- 9
- Issue :
- 5
- Database :
- Academic Search Index
- Journal :
- PLoS ONE
- Publication Type :
- Academic Journal
- Accession number :
- 96282074
- Full Text :
- https://doi.org/10.1371/journal.pone.0095148