25 results on '"bit parallelism"'
Search Results
2. Bit-Parallelism Score Computation with Multi Integer Weight
- Author
-
Setyorini Setyorini, Dwi H. Widyantoro, Adi Pancoro, and Kuspriyanto Kuspriyanto
- Subjects
Computer science ,Computation ,General Engineering ,Bit parallelism ,Arithmetic ,Integer (computer science) - Published
- 2019
- Full Text
- View/download PDF
3. Efficiently enumerating all maximal cliques with bit-parallelism
- Author
-
Pablo San Segundo, Jorge Artieda, and Darren Strash
- Subjects
FOS: Computer and information sciences ,F.2.2 ,G.2.2 ,Current (mathematics) ,General Computer Science ,Selection (relational algebra) ,0102 computer and information sciences ,02 engineering and technology ,Management Science and Operations Research ,Clique (graph theory) ,01 natural sciences ,Clique problem ,010201 computation theory & mathematics ,Simple (abstract algebra) ,Modeling and Simulation ,Computer Science - Data Structures and Algorithms ,0202 electrical engineering, electronic engineering, information engineering ,Enumeration ,Benchmark (computing) ,Data Structures and Algorithms (cs.DS) ,020201 artificial intelligence & image processing ,Bit parallelism ,Sociology ,Algorithm - Abstract
The maximal clique enumeration (MCE) problem has numerous applications in biology, chemistry, sociology, and graph modeling. Though this problem is well studied, most current research focuses on finding solutions in large sparse graphs or very dense graphs, while sacrificing efficiency on the most difficult medium-density benchmark instances that are representative of data sets often encountered in practice. We show that techniques that have been successfully applied to the maximum clique problem give significant speed gains over the state-of-the-art MCE algorithms on these instances. Specifically, we show that a simple greedy pivot selection based on a fixed maximum-degree first ordering of vertices, when combined with bit-parallelism, performs consistently better than the theoretical worst-case optimal pivoting of the state-of-the-art algorithms of Tomita et al. [Theoretical Computer Science, 2006] and Naud\'e [Theoretical Computer Science, 2016]. Experiments show that our algorithm is faster than the worst-case optimal algorithm of Tomita et al. on 60 out of 74 standard structured and random benchmark instances: we solve 48 instances 1.2 to 2.2 times faster, and solve the remaining 12 instances 3.6 to 47.6 times faster. We also see consistent speed improvements over the algorithm of Naud\'e: solving 61 instances 1.2 to 2.4 times faster. To the best of our knowledge, we are the first to achieve such speed-ups compared to these state-of-the-art algorithms on these standard benchmarks., Comment: 15 pages, 1 figure
- Published
- 2018
- Full Text
- View/download PDF
4. Detecting $k$-(Sub-)Cadences and Equidistant Subsequence Occurrences
- Author
-
Funakoshi, Mitsuru, Nakashima, Yuto, Inenaga, Shunsuke, Bannai, Hideo, Takeda, Masayuki, and Shinohara, Ayumi
- Subjects
FOS: Computer and information sciences ,050101 languages & linguistics ,cadences ,05 social sciences ,02 engineering and technology ,string algorithms ,pattern matching ,Computer Science - Data Structures and Algorithms ,bit parallelism ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Data Structures and Algorithms (cs.DS) ,subsequences - Abstract
The equidistant subsequence pattern matching problem is considered. Given a pattern string P and a text string T, we say that P is an equidistant subsequence of T if P is a subsequence of the text such that consecutive symbols of P in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield o(n²) time algorithms for finding k-(sub-)cadences and equidistant subsequences. Furthermore, O(nlog² n) and O(nlog n) time algorithms, respectively for equidistant and Abelian equidistant matching for the case |P| = 3, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints., LIPIcs, Vol. 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020), pages 12:1-12:11
- Published
- 2020
- Full Text
- View/download PDF
5. Approximate search for big data with applications in information security: A survey
- Author
-
Slobodan Petrovic
- Subjects
Big Data ,information security ,Chemistry ,business.industry ,Computer science ,Digital forensics ,Big data ,approximate search ,bit-parallelism ,Intrusion detection system ,Information security ,computer.software_genre ,lcsh:HD72-88 ,lcsh:Economic growth, development, planning ,Search algorithm ,Malware ,Bit parallelism ,Data mining ,constraints ,business ,Business management ,computer - Abstract
This paper is a survey of approximate search techniques in very large data sets (so-called Big Data). After a short introduction, some techniques for speeding up approximate search in such data sets based on exploitation of inherent bit-parallelism in computers are described. It then reviews the applications in search related to information security problems (digital forensics, malware detection, intrusion detection) are reviewed. Finally, the need for constraints in approximate search regarding the number of so-called elementary edit operations and the run lengths of particular elementary edit operations is explained and the status of on-going research on efficient implementation of approximate search algorithms with various constraints is given.
- Published
- 2015
- Full Text
- View/download PDF
6. String matching with lookahead
- Author
-
Hannu Peltola and Jorma Tarhio
- Subjects
Theoretical computer science ,LOOP (programming language) ,Generalization ,Computer science ,Character (computing) ,Applied Mathematics ,Binary data ,Discrete Mathematics and Combinatorics ,Window (computing) ,Bit parallelism ,String searching algorithm ,Variation (game tree) ,Algorithm - Abstract
Forward-SBNDM is a recently introduced variation of the BNDM algorithm for exact string matching. Forward-SBNDM inspects a 2-gram in the text such that the first character is the last one of an alignment window of the pattern and the second one is then outside the window. We present a generalization of this idea by inspecting several lookahead characters beyond an alignment window and integrate it with SBNDMq, a q-gram variation of BNDM. As a result, we get several new variations of SBNDMq. In addition, we introduce a greedy skip loop for SBNDM2. We tune up our algorithms and the reference algorithms with 2-byte read. According to our experiments, the best of the new variations are faster than the winners of recent algorithm comparisons for English, DNA, and binary data.
- Published
- 2014
- Full Text
- View/download PDF
7. BIT-PARALLEL ALGORITHMS FOR THE MERGED LONGEST COMMON SUBSEQUENCE PROBLEM
- Author
-
Sebastian Deorowicz and Agnieszka Danek
- Subjects
Longest common subsequence problem ,Theoretical computer science ,Similarity (geometry) ,Longest alternating subsequence ,Computer Science (miscellaneous) ,Hunt–McIlroy algorithm ,Parallel algorithm ,Edit distance ,Bit parallelism ,Longest increasing subsequence ,Algorithm ,Mathematics - Abstract
It is often a necessity to compare some sequences to find out how similar they are. There are many similarity measures that can be used, e.g., longest common subsequence, edit distance, sequence alignment. Recently a merged longest common subsequence (MergedLCS) problem was formulated with applications in bioinformatics. We propose the bit-parallel algorithms for the MergedLCS problem and evaluate them in practice showing that they are usually tens times faster than the already published methods.
- Published
- 2013
- Full Text
- View/download PDF
8. A Simple Streaming Bit-Parallel Algorithm for Swap Pattern Matching
- Author
-
Ondřej Suchý, Tomáš Valla, and Václav Blažej
- Subjects
Deterministic finite automaton ,Swap (finance) ,Computer science ,Efficient algorithm ,Parallel algorithm ,Bit parallelism ,Pattern matching ,Algorithm ,Bitwise operation ,Blossom algorithm - Abstract
The pattern matching problem with swaps is to find all occurrences of a pattern in a text while allowing the pattern to swap adjacent symbols. The goal is to design fast matching algorithm that takes advantage of the bit parallelism of bitwise machine instructions and has only streaming access to the input. We introduce a new approach to solve this problem based on the graph theoretic model and compare its performance to previously known algorithms. We also show that an approach using deterministic finite automata cannot achieve similarly efficient algorithms. Furthermore, we describe a fatal flaw in some of the previously published algorithms based on the same model. Finally, we provide experimental evaluation of our algorithm on real-world data.
- Published
- 2017
- Full Text
- View/download PDF
9. Improving practical exact string matching
- Author
-
Branislav Ďurian, Jorma Tarhio, Hannu Peltola, and Jan Holub
- Subjects
ta113 ,State variable ,Theoretical computer science ,Parallel algorithm ,Information processing ,Process (computing) ,String searching algorithm ,Computer Science Applications ,Theoretical Computer Science ,Experimental comparison ,ta5141 ,Signal Processing ,String matching ,Point (geometry) ,Bit parallelism ,ta513 ,ta512 ,Bit-parallelism ,Algorithm ,ta515 ,Algorithms ,Information Systems ,Mathematics - Abstract
We present improved variations of the BNDM algorithm for exact string matching. At each alignment our bit-parallel algorithms process a q-gram before testing the state variable. In addition we apply reading a 2-gram in one instruction. Our point of view is practical efficiency of algorithms. Our experiments show that the new variations are faster than earlier algorithms in many cases.
- Published
- 2010
- Full Text
- View/download PDF
10. An efficient pattern matching scheme in LZW compressed sequences
- Author
-
Nai-Lun Huang and Tsern-Huei Lee
- Subjects
Computer Networks and Communications ,Computer science ,Data_CODINGANDINFORMATIONTHEORY ,String searching algorithm ,computer.file_format ,Uncompressed video ,Space requirements ,Bitmap ,Bit parallelism ,Pattern matching ,Algorithm ,computer ,Information Systems ,Compressed pattern matching - Abstract
Summary Compressed pattern matching (CPM) is an emerging research field addressing the problem: given a compressed sequence and a pattern, process the sequence with minimal (or no) decompression to find the pattern occurrence(s) in the uncompressed sequence. It can be applied to detect malwares and confidential information leakage in compressed files directly. In this paper, we report our work of CPM in Lempel–Ziv–Welch (LZW) compressed sequences. We propose an efficient bitmap-based realization of the Amir–Benson–Farach algorithm. We also generalize the algorithm to find all pattern occurrences and report their absolute positions in the uncompressed sequence. Experiments are conducted to test the space requirements of our proposed generalization and two related CPM schemes which can also be realized with bitmaps. Results show that our proposed generalization requires the least amount of storage for moderate and long patterns. We also conduct experiments to compare the throughput performance of our proposed generalization with these two related CPM schemes and the decompress-then-search scheme. Results show that our proposed generalization outperforms the decompress-then-search scheme significantly. When scanning a file with pattern occurrences, our proposed generalization performs slightly better than the two related CPM schemes. The difference is significant when scanning a file with no pattern occurrence. Copyright # 2008 John Wiley & Sons, Ltd.
- Published
- 2008
- Full Text
- View/download PDF
11. Word Size Removal (WSR) method for Bit Parallel String Matching
- Author
-
Kapil Kumar Soni and Vivek Kumar Sharma
- Subjects
Bit (horse) ,Pattern size ,Computer science ,Bit parallelism ,String searching algorithm ,Algorithm ,Word (computer architecture) - Abstract
Bit Parallel String Matching algorithms are the most efficient and latest class of String Matching Algorithms. These algorithms use the intrinsic property of computer known as Bit Parallelism for performing bit operations in parallel with-in single computer word. BNDM, TNDM, BNDMq are the popular single pattern bit parallel string matching algorithms whereas Shift-OR and Shift-OR with Q-grams algorithms are popular in multiple pattern string matching. All these algorithms are restricted by the limitation of pattern size. These all algorithms are not working on patterns whose sizes are longer than computer word. In this paper, we proposed the generic method named WSR (Word Size Removal) for bit parallel string matching. With the inclusion of WSR these bit parallel string matching algorithms can work on longer than computer word size patterns. This paper presented WSR method and all modified algorithms.
- Published
- 2015
- Full Text
- View/download PDF
12. The wide window string matching algorithm
- Author
-
Jie Sui, Longtao He, and Binxing Fang
- Subjects
Current (mathematics) ,General Computer Science ,Computer science ,Bit parallelism ,Commentz-Walter algorithm ,Window (computing) ,String searching algorithm ,Rabin–Karp algorithm ,Theoretical Computer Science ,Automaton ,Prefix ,Combinatorics ,Wide window algorithm ,Suffix automaton ,String matching ,Reverse prefix automaton ,Algorithm ,Computer Science(all) - Abstract
Generally, current string matching algorithms make use of a window whose size is equal to pattern length. In this paper, we present a novel string matching algorithm named WW (for Wide Window) algorithm, which divides the text into n/m overlapping windows of size 2m-1. In the windows, the algorithm attempts m possible occurrence positions in parallel. It firstly searches pattern suffixes from middle to right with a forward suffix automaton, shifts the window directly when it fails, otherwise, scans the corresponding prefixes backward with a reverse prefix automaton. Theoretical analysis shows that WW has optimal time complexity of O(n) in the worst, O(n/m) best and O(n(logσm)/m) for average case. Experimental comparison of WW with existing algorithms validates our theoretical claims for searching long patterns. It further reveals that WW is also efficient for searching short patterns.
- Published
- 2005
- Full Text
- View/download PDF
13. Regular expression searching on compressed text
- Author
-
Gonzalo Navarro
- Subjects
Discrete mathematics ,Ziv–Lempel ,Regular languages ,Automata ,Automaton ,Theoretical Computer Science ,Regular language ,Computational Theory and Mathematics ,Search algorithm ,Compressed pattern matching ,Discrete Mathematics and Combinatorics ,Bit parallelism ,Regular expression ,Bit-parallelism ,Mathematics - Abstract
We present a solution to the problem of regular expression searching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text in O(2m + mn + Rm log m) worst case time. On average this drops to O(m2 + (n + Rm) log m) or O(m2 + n + Ru/n) for most regular expressions. This is the first nontrivial result for this problem. The experimental results show that our compressed search algorithm needs half the time necessary for decompression plus searching, which is currently the only alternative.
- Published
- 2003
- Full Text
- View/download PDF
14. The implementation of bit-parallelism for DNA sequence alignment
- Author
-
Dwi H. Widyantoro, Kuspriyanto, Setyorini, and Adi Pancoro
- Subjects
History ,Computer science ,Sequence alignment ,Bit parallelism ,Parallel computing ,Computer Science Applications ,Education - Published
- 2017
- Full Text
- View/download PDF
15. Fast Longest Common Subsequence with General Integer Scoring Support on GPUs
- Author
-
Martin Swany, Adnan Ozsoy, and Arun Chauhan
- Subjects
Longest common subsequence problem ,Discrete mathematics ,CUDA ,Class (computer programming) ,Theoretical computer science ,Matching (graph theory) ,Computer science ,Benchmark (computing) ,Bit parallelism ,Longest increasing subsequence ,Parallel computing ,Massively parallel ,Integer (computer science) - Abstract
Graphic Processing Units (GPUs) have been gaining popularity among high-performance users. Certain classes of algorithms benefit greatly from the massive parallelism of GPUs. One such class of algorithms is longest common subsequence (LCS). Combined with bit parallelism, recent studies have been able to achieve terascale performance for LCS on GPUs. However, the reported results for the one-to-many matching problem lack correlation with weighted scoring algorithms. In this paper, we describe a novel technique to improve the score significance of the length of LCS algorithm for multiple matching. We extend the bit-vector algorithms for LCS to include integer scoring and parallelize them for hybrid CPU-GPU platforms. We benchmark our algorithm against the well-known sequence alignment algorithm on GPUs, CUDASW++, for accuracy and report performance on three different systems.
- Published
- 2014
- Full Text
- View/download PDF
16. On Pattern Matching With Swaps
- Author
-
Fouad B. Chedid
- Subjects
Dynamic programming ,Computational complexity theory ,Computer science ,Efficient algorithm ,Approximation algorithm ,Bit parallelism ,Variation (game tree) ,Pattern matching ,Disjoint sets ,Algorithm - Abstract
Pattern Matching with Swaps (PMS for short) is a variation of the classical pattern matching problem where a match is allowed to include disjoint local swaps. In 2009, Cantone and Faro devised a new dynamic programming algorithm for PMS, named Cross-Sampling, that runs in O(nm) time and uses O(m) space. More important, Cross-Sampling admits a linear-time implementation based on bit parallelism when the pattern's size is comparable to the word size of the machine. In this paper, we present improved dynamic programming formulations of the approach of Cantone and Faro for PMS which result in simpler algorithms that are much easier to be comprehended and implemented.
- Published
- 2013
- Full Text
- View/download PDF
17. Filtering Based Multiple String Matching Algorithm Combining q-Grams and BNDM
- Author
-
Guiran Chang, Xingwei Wang, and Changsheng Miao
- Subjects
Statistical classification ,Computer science ,Preprocessor ,Approximation algorithm ,Algorithm design ,Bit parallelism ,String searching algorithm ,Pattern matching ,Algorithm - Abstract
We present a new algorithm for exact multiple string matching. Our algorithm is based on filtration combining BNDM and q-grams. We have tested it with experiments and compared it with other algorithms, e.g. DFA, AC_BM and MWM. The preprocessing phase of our algorithms is fast, the memory usage is fairly small, and our algorithm is considerably faster for huge sets of several thousand patterns. The benefits are due to the bit parallelism of BNDM and the improved filtering efficiency by q-grams.
- Published
- 2010
- Full Text
- View/download PDF
18. Fast text searching
- Author
-
Udi Manber and Sun Wu
- Subjects
Data processing ,General Computer Science ,Bitap algorithm ,Search algorithm ,Computer science ,Bit parallelism ,Pattern matching ,Approximate string matching ,Algorithm ,Text searching ,Compressed pattern matching - Published
- 1992
- Full Text
- View/download PDF
19. BPBM: An Algorithm for String Matching with Wildcards and Length Constraints
- Author
-
Yingling Liu, Jun Gao, Xuegang Hu, Gongqing Wu, Xiao-Li Hong, and Xindong Wu
- Subjects
Sequence ,Matching (graph theory) ,business.industry ,Commentz-Walter algorithm ,Scale (descriptive set theory) ,Pattern recognition ,Wildcard character ,computer.file_format ,String searching algorithm ,Bit parallelism ,Pattern matching ,Artificial intelligence ,business ,Algorithm ,computer ,Mathematics - Abstract
Pattern matching with wildcards and length constraints under the one-off condition is a challenging topic. We propose an algorithm BPBM, based on bit parallelism and the Boyer-Moore algorithm, that outputs an occurrence of a given pattern P as soon as the pattern appears in the given sequence. The experimental results show that our BPBM algorithm has an improved time performance of over 50% with the same matching results when compared with SAIL, a state-of-the-art algorithm of this matching problem. The superiority is even more remarkable when the scale of the pattern increases.
- Published
- 2009
- Full Text
- View/download PDF
20. Multipattern string matching with q -grams
- Author
-
Leena Salmela, Jorma Tarhio, and Jari Kytöjoki
- Subjects
Trie ,Commentz-Walter algorithm ,Bit parallelism ,String searching algorithm ,Intrusion detection system ,Approximate string matching ,Algorithm ,Theoretical Computer Science ,Mathematics - Abstract
We present three algorithms for exact string matching of multiple patterns. Our algorithms are filtering methods, which apply q -grams and bit parallelism. We ran extensive experiments with them and compared them with various versions of earlier algorithms, e.g., different trie implementations of the Aho--Corasick algorithm. All of our algorithms appeared to be substantially faster than earlier solutions for sets of 1,000--10,000 patterns and the good performance of two of them continues to 100,000 patterns. The gain is because of the improved filtering efficiency caused by q -grams.
- Published
- 2007
- Full Text
- View/download PDF
21. Multi-Dimensional Packet Classification Using Tuple Space Based on Bit-Parallelism
- Author
-
Wang
- Subjects
Computer science ,Multi dimensional ,Tuple space ,Bit parallelism ,Parallel computing ,Packet classification - Published
- 2006
- Full Text
- View/download PDF
22. Efficient Bit-Parallel Algorithms for (δ,α)-Matching
- Author
-
Szymon Grabowski and Kimmo Fredriksson
- Subjects
Combinatorics ,Matching (graph theory) ,Parallel algorithm ,Bit parallelism ,String searching algorithm ,Pattern matching ,Approximate string matching ,Algorithm ,Word (group theory) ,Substring ,Mathematics - Abstract
We consider the following string matching problem. Pattern p0p1p2 ... pm−−1 (δ,α)-matches the text substring $t_{i_0} t_{i_1} t_{i_2} \ldots t_{i_{m-1}}$, if $|p_j-t_{i_j}|\leq\delta$ for j ∈ {0,..., m–1}, where 0 < ij+1 – ij ≤ α + 1. The task is then to find all text positions im−−1 that (δ,α)-match the pattern. For a text of length n, the best previously known algorithms for this string matching problem run in time O(nm) and in time O(n⌈mα/w⌉), where the former is based on dynamic programming, and the latter on bit-parallelism with w bits in computer word (32 or 64 typically). We improve these to take O(nδ+⌈n/w⌉m) and O(n ⌈m log(α)/w⌉), respectively, worst case time using bit-parallelism. On average the algorithms run in O(⌈n/w⌉⌈αδ/σ⌉+n)and O(n) time. Our experimental results show that the algorithms work extremely well in practice. Our algorithms handle general gaps as well, having important applications in computational biology.
- Published
- 2006
- Full Text
- View/download PDF
23. Increased bit-parallelism for approximate and multiple string matching
- Author
-
Kimmo Fredriksson, Heikki Hyyrö, and Gonzalo Navarro
- Subjects
Combinatorics ,Boosting (machine learning) ,Speedup ,Computation ,Commentz-Walter algorithm ,Bit parallelism ,Edit distance ,String searching algorithm ,Approximate string matching ,Theoretical Computer Science ,Mathematics - Abstract
Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O (⌈ m / w ⌉ n ), where w is the number of bits in the computer word. Although this is asymptotically the optimal bit-parallel speedup over the basic O ( mn ) time algorithm, it wastes bit-parallelism's power in the common case where m is much smaller than w , since w − m bits in the computer words are unused. In this paper, we explore different ways to increase the bit-parallelism when the search pattern is short. First, we show how multiple patterns can be packed into a single computer word so as to search for all them simultaneously. Instead of spending O ( rn ) time to search for r patterns of length m ≤ w /2, we need O (⌈ rm / w ⌉ n ) time. Second, we show how the mechanism permits boosting the search for a single pattern of length m ≤ w /2, which can be searched for in O (⌈ n /⌊ w / m ⌋⌉) bit-parallel steps instead of O ( n ). Third, we show how to extend these algorithms so that the time bounds essentially depend on k instead of m , where k is the maximum number of differences permitted. Finally, we show how the ideas can be applied to other problems such as multiple exact string matching and one-against-all computation of edit distance and longest common subsequences. Our experimental results show that the new algorithms work well in practice, obtaining significant speedups over the best existing alternatives, especially on short patterns and moderate number of differences allowed. This work fills an important gap in the field, where little work has focused on very short patterns.
- Published
- 2005
- Full Text
- View/download PDF
24. Increased Bit-Parallelism for Approximate String Matching
- Author
-
Gonzalo Navarro, Kimmo Fredriksson, and Heikki Hyyrö
- Subjects
Set (abstract data type) ,Speedup ,Bit parallelism ,Approximate string matching ,Algorithm ,Word (computer architecture) ,Power (physics) ,Mathematics - Abstract
Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(⌈m/w⌉n), where w is the number of bits in the computer word. Although this is asymptotically the optimal speedup over the basic O(mn) time algorithm, it wastes bit-parallelism’s power in the common case where m is much smaller than w, since w-m bits in the computer words get unused.
- Published
- 2004
- Full Text
- View/download PDF
25. Fast and flexible string matching by combining bit-parallelism and suffix automata
- Author
-
Gonzalo Navarro and Mathieu Raffinot
- Subjects
Nondeterministic algorithm ,Theoretical computer science ,Computer science ,Suffix automaton ,Bit parallelism ,String searching algorithm ,Pattern matching ,Alphabet ,Suffix ,Algorithm ,Theoretical Computer Science ,Automaton - Abstract
The most important features of a string matching algorithm are its efficiency and its flexibility. Efficiency has traditionally received more attention, while flexibility in the search pattern is becoming a more and more important issue. Most classical string matching algorithms are aimed at quickly finding an exact pattern in a text, being Knuth-Morris-Pratt (KMP) and the Boyer-Moore (BM) family the most famous ones. A recent development uses deterministic "suffix automata" to design new optimal string matching algorithms, e.g. BDM and TurboBDM. Flexibility has been addressed quite separately by the use of "bit-parallelism", which simulates automata in their nondeterministic form by using bits and exploiting the intrinsic parallelism inside the computer word, e.g. the Shift-Or algorithm. Those algorithms are extended to handle classes of characters and errors in the pattern and/or in the text, their drawback being their inability to skip text characters. In this paper we merge bit-parallelism and suffix automata, so that a nondeterministic suffix automaton is simulated using bit-parallelism. The resulting algorithm, called BNDM, obtains the best from both worlds. It is much simpler to implement than BDM and nearly as simple as Shift-Or. It inherits from Shift-Or the ability to handle flexible patterns and from BDM the ability to skip characters. BNDM is 30%-40% faster than BDM and up to 7 times faster than Shift-Or. When compared to the fastest existing algorithms on exact patterns (which belong to the BM family), BNDM is from 20% slower to 3 times faster, depending on the alphabet size. With respect to flexible pattern searching, BNDM is by far the fastest technique to deal with classes of characters and is competitive to search allowing errors. In particular, BNDM seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area. As a theoretical development related to flexible pattern matching, we introduce a new automaton to recognize suffixes of patterns with classes of characters. To the best of our knowledge, this automaton has not been studied before.
- Published
- 2000
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.