4 results on '"bit parallelism"'
Search Results
2. Multipattern string matching with q -grams
- Author
-
Leena Salmela, Jorma Tarhio, and Jari Kytöjoki
- Subjects
Trie ,Commentz-Walter algorithm ,Bit parallelism ,String searching algorithm ,Intrusion detection system ,Approximate string matching ,Algorithm ,Theoretical Computer Science ,Mathematics - Abstract
We present three algorithms for exact string matching of multiple patterns. Our algorithms are filtering methods, which apply q -grams and bit parallelism. We ran extensive experiments with them and compared them with various versions of earlier algorithms, e.g., different trie implementations of the Aho--Corasick algorithm. All of our algorithms appeared to be substantially faster than earlier solutions for sets of 1,000--10,000 patterns and the good performance of two of them continues to 100,000 patterns. The gain is because of the improved filtering efficiency caused by q -grams.
- Published
- 2007
- Full Text
- View/download PDF
3. Increased bit-parallelism for approximate and multiple string matching
- Author
-
Kimmo Fredriksson, Heikki Hyyrö, and Gonzalo Navarro
- Subjects
Combinatorics ,Boosting (machine learning) ,Speedup ,Computation ,Commentz-Walter algorithm ,Bit parallelism ,Edit distance ,String searching algorithm ,Approximate string matching ,Theoretical Computer Science ,Mathematics - Abstract
Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O (⌈ m / w ⌉ n ), where w is the number of bits in the computer word. Although this is asymptotically the optimal bit-parallel speedup over the basic O ( mn ) time algorithm, it wastes bit-parallelism's power in the common case where m is much smaller than w , since w − m bits in the computer words are unused. In this paper, we explore different ways to increase the bit-parallelism when the search pattern is short. First, we show how multiple patterns can be packed into a single computer word so as to search for all them simultaneously. Instead of spending O ( rn ) time to search for r patterns of length m ≤ w /2, we need O (⌈ rm / w ⌉ n ) time. Second, we show how the mechanism permits boosting the search for a single pattern of length m ≤ w /2, which can be searched for in O (⌈ n /⌊ w / m ⌋⌉) bit-parallel steps instead of O ( n ). Third, we show how to extend these algorithms so that the time bounds essentially depend on k instead of m , where k is the maximum number of differences permitted. Finally, we show how the ideas can be applied to other problems such as multiple exact string matching and one-against-all computation of edit distance and longest common subsequences. Our experimental results show that the new algorithms work well in practice, obtaining significant speedups over the best existing alternatives, especially on short patterns and moderate number of differences allowed. This work fills an important gap in the field, where little work has focused on very short patterns.
- Published
- 2005
- Full Text
- View/download PDF
4. Fast and flexible string matching by combining bit-parallelism and suffix automata
- Author
-
Gonzalo Navarro and Mathieu Raffinot
- Subjects
Nondeterministic algorithm ,Theoretical computer science ,Computer science ,Suffix automaton ,Bit parallelism ,String searching algorithm ,Pattern matching ,Alphabet ,Suffix ,Algorithm ,Theoretical Computer Science ,Automaton - Abstract
The most important features of a string matching algorithm are its efficiency and its flexibility. Efficiency has traditionally received more attention, while flexibility in the search pattern is becoming a more and more important issue. Most classical string matching algorithms are aimed at quickly finding an exact pattern in a text, being Knuth-Morris-Pratt (KMP) and the Boyer-Moore (BM) family the most famous ones. A recent development uses deterministic "suffix automata" to design new optimal string matching algorithms, e.g. BDM and TurboBDM. Flexibility has been addressed quite separately by the use of "bit-parallelism", which simulates automata in their nondeterministic form by using bits and exploiting the intrinsic parallelism inside the computer word, e.g. the Shift-Or algorithm. Those algorithms are extended to handle classes of characters and errors in the pattern and/or in the text, their drawback being their inability to skip text characters. In this paper we merge bit-parallelism and suffix automata, so that a nondeterministic suffix automaton is simulated using bit-parallelism. The resulting algorithm, called BNDM, obtains the best from both worlds. It is much simpler to implement than BDM and nearly as simple as Shift-Or. It inherits from Shift-Or the ability to handle flexible patterns and from BDM the ability to skip characters. BNDM is 30%-40% faster than BDM and up to 7 times faster than Shift-Or. When compared to the fastest existing algorithms on exact patterns (which belong to the BM family), BNDM is from 20% slower to 3 times faster, depending on the alphabet size. With respect to flexible pattern searching, BNDM is by far the fastest technique to deal with classes of characters and is competitive to search allowing errors. In particular, BNDM seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area. As a theoretical development related to flexible pattern matching, we introduce a new automaton to recognize suffixes of patterns with classes of characters. To the best of our knowledge, this automaton has not been studied before.
- Published
- 2000
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.