47 results on '"bit parallelism"'
Search Results
2. Linear work generation of R-MAT graphs.
- Author
-
Hübschle-Schneider, Lorenz and Sanders, Peter
- Subjects
PARALLELISM (Linguistics) ,SUPERCOMPUTERS - Published
- 2020
- Full Text
- View/download PDF
3. Regular Expression Matching
- Author
-
Ilie, Lucian and Kao, Ming-Yang, editor
- Published
- 2016
- Full Text
- View/download PDF
4. High Performance Construction of RecSplit Based Minimal Perfect Hash Functions
- Author
-
Dominik Bez and Florian Kurpicz and Hans-Peter Lehmann and Peter Sanders, Bez, Dominik, Kurpicz, Florian, Lehmann, Hans-Peter, Sanders, Peter, Dominik Bez and Florian Kurpicz and Hans-Peter Lehmann and Peter Sanders, Bez, Dominik, Kurpicz, Florian, Lehmann, Hans-Peter, and Sanders, Peter
- Abstract
A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX'20] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way. We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs. In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On the GPU, we achieve the same space usage in just 5 seconds. Given that the speedups are larger than the increase in energy consumption, our implementation is more energy efficient than the original implementation.
- Published
- 2023
- Full Text
- View/download PDF
5. Bit-Parallel Multiple Pattern Matching
- Author
-
Tran, Tuan Tu, Giraud, Mathieu, Varré, Jean-Stéphane, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Wyrzykowski, Roman, editor, Dongarra, Jack, editor, Karczewski, Konrad, editor, and Waśniewski, Jerzy, editor
- Published
- 2012
- Full Text
- View/download PDF
6. A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms
- Author
-
Faro, Simone, Lecroq, Thierry, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, and Klasing, Ralf, editor
- Published
- 2012
- Full Text
- View/download PDF
7. Bit-(Parallelism)2: Getting to the Next Level of Parallelism
- Author
-
Cantone, Domenico, Faro, Simone, Giaquinta, Emanuele, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Boldi, Paolo, editor, and Gargano, Luisa, editor
- Published
- 2010
- Full Text
- View/download PDF
8. An Efficient and Accurate Stochastic Number Generator Using Even-Distribution Coding.
- Author
-
Zhakatayev, Aidyn, Kim, Kyounghoon, Choi, Kiyoung, and Lee, Jongeun
- Subjects
- *
SYNTHETIC natural gas , *SHIFT registers , *LOGIC circuits , *BINARY sequences - Abstract
Stochastic computing (SC) is a promising approach for low-power and low-cost applications with the added benefit of high error tolerance. However, the high overhead of generating stochastic bitstreams can offset the advantages of SC especially when a large number of bitstreams are needed. In this paper, we propose a new stochastic number generator (SNG) that significantly reduces area and energy while improving accuracy. Experimental results show that the proposed SNG can reduce energy by more than 72% compared with the state-of-the-art designs. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
9. BitmapAligner: Bit-Parallelism String Matching with MapReduce and Hadoop
- Author
-
Mary Aksa, Toqeer Mahmood, Junaid Rashid, Muhammad Wasif Nisar, Amir Hussain, and Hyuk-Yoon Kwon
- Subjects
Biomaterials ,Mechanics of Materials ,Computer science ,Modeling and Simulation ,Bit parallelism ,Parallel computing ,String searching algorithm ,Electrical and Electronic Engineering ,Computer Science Applications - Published
- 2021
- Full Text
- View/download PDF
10. Bit-Parallelism Score Computation with Multi Integer Weight
- Author
-
Setyorini Setyorini, Dwi H. Widyantoro, Adi Pancoro, and Kuspriyanto Kuspriyanto
- Subjects
Computer science ,Computation ,General Engineering ,Bit parallelism ,Arithmetic ,Integer (computer science) - Published
- 2019
- Full Text
- View/download PDF
11. Internal shortest absent word queries
- Author
-
Badkobeh, Golnaz, Charalampopoulos, Panagiotis, Pissis, Solon P., Gawrychowski, Pawel, Starikovskaya, Tatiana, Gawrychowski, Pawel, Starikovskaya, Tatiana, Bioinformatics, AIMMS, and Bio Informatics (IBIVU)
- Subjects
String algorithms ,Shortest absent word ,Internal queries ,Bit parallelism ,Theory of computation → Pattern matching - Abstract
Given a string T of length n over an alphabet Σ ⊂ {1,2,…,n^{𝒪(1)}} of size σ, we are to preprocess T so that given a range [i,j], we can return a representation of a shortest string over Σ that is absent in the fragment T[i]⋯ T[j] of T. For any positive integer k ∈ [1,log log_σ n], we present an 𝒪((n/k)⋅ log log_σ n)-size data structure, which can be constructed in 𝒪(nlog_σ n) time, and answers queries in time 𝒪(log log_σ k)., LIPIcs, Vol. 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021), pages 6:1-6:18
- Published
- 2021
- Full Text
- View/download PDF
12. Efficiently enumerating all maximal cliques with bit-parallelism
- Author
-
Pablo San Segundo, Jorge Artieda, and Darren Strash
- Subjects
FOS: Computer and information sciences ,F.2.2 ,G.2.2 ,Current (mathematics) ,General Computer Science ,Selection (relational algebra) ,0102 computer and information sciences ,02 engineering and technology ,Management Science and Operations Research ,Clique (graph theory) ,01 natural sciences ,Clique problem ,010201 computation theory & mathematics ,Simple (abstract algebra) ,Modeling and Simulation ,Computer Science - Data Structures and Algorithms ,0202 electrical engineering, electronic engineering, information engineering ,Enumeration ,Benchmark (computing) ,Data Structures and Algorithms (cs.DS) ,020201 artificial intelligence & image processing ,Bit parallelism ,Sociology ,Algorithm - Abstract
The maximal clique enumeration (MCE) problem has numerous applications in biology, chemistry, sociology, and graph modeling. Though this problem is well studied, most current research focuses on finding solutions in large sparse graphs or very dense graphs, while sacrificing efficiency on the most difficult medium-density benchmark instances that are representative of data sets often encountered in practice. We show that techniques that have been successfully applied to the maximum clique problem give significant speed gains over the state-of-the-art MCE algorithms on these instances. Specifically, we show that a simple greedy pivot selection based on a fixed maximum-degree first ordering of vertices, when combined with bit-parallelism, performs consistently better than the theoretical worst-case optimal pivoting of the state-of-the-art algorithms of Tomita et al. [Theoretical Computer Science, 2006] and Naud\'e [Theoretical Computer Science, 2016]. Experiments show that our algorithm is faster than the worst-case optimal algorithm of Tomita et al. on 60 out of 74 standard structured and random benchmark instances: we solve 48 instances 1.2 to 2.2 times faster, and solve the remaining 12 instances 3.6 to 47.6 times faster. We also see consistent speed improvements over the algorithm of Naud\'e: solving 61 instances 1.2 to 2.4 times faster. To the best of our knowledge, we are the first to achieve such speed-ups compared to these state-of-the-art algorithms on these standard benchmarks., Comment: 15 pages, 1 figure
- Published
- 2018
- Full Text
- View/download PDF
13. Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences
- Author
-
Mitsuru Funakoshi and Yuto Nakashima and Shunsuke Inenaga and Hideo Bannai and Masayuki Takeda and Ayumi Shinohara, Funakoshi, Mitsuru, Nakashima, Yuto, Inenaga, Shunsuke, Bannai, Hideo, Takeda, Masayuki, Shinohara, Ayumi, Mitsuru Funakoshi and Yuto Nakashima and Shunsuke Inenaga and Hideo Bannai and Masayuki Takeda and Ayumi Shinohara, Funakoshi, Mitsuru, Nakashima, Yuto, Inenaga, Shunsuke, Bannai, Hideo, Takeda, Masayuki, and Shinohara, Ayumi
- Abstract
The equidistant subsequence pattern matching problem is considered. Given a pattern string P and a text string T, we say that P is an equidistant subsequence of T if P is a subsequence of the text such that consecutive symbols of P in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield o(n²) time algorithms for finding k-(sub-)cadences and equidistant subsequences. Furthermore, O(nlog² n) and O(nlog n) time algorithms, respectively for equidistant and Abelian equidistant matching for the case |P| = 3, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints.
- Published
- 2020
- Full Text
- View/download PDF
14. On finding k-cliques in k-partite graphs.
- Author
-
Mirghorbani, M. and Krokhmal, P.
- Abstract
In this paper, a branch-and-bound algorithm for finding all cliques of size k in a k-partite graph is proposed that improves upon the method of Grunert et al. (in Comput Oper Res 29(1):13-31, ). The new algorithm uses bit-vectors, or bitsets, as the main data structure in bit-parallel operations. Bitsets enable a new form of data representation that improves branching and backtracking of the branch-and-bound procedure. Numerical studies on randomly generated instances of k-partite graphs demonstrate competitiveness of the developed method. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
15. The wide window string matching algorithm
- Author
-
He, Longtao, Fang, Binxing, and Sui, Jie
- Subjects
- *
ROBOTS , *ALGORITHMS , *GAME theory , *MATHEMATICAL models - Abstract
Abstract: Generally, current string matching algorithms make use of a window whose size is equal to pattern length. In this paper, we present a novel string matching algorithm named WW (for Wide Window) algorithm, which divides the text into overlapping windows of size . In the windows, the algorithm attempts m possible occurrence positions in parallel. It firstly searches pattern suffixes from middle to right with a forward suffix automaton, shifts the window directly when it fails, otherwise, scans the corresponding prefixes backward with a reverse prefix automaton. Theoretical analysis shows that WW has optimal time complexity of in the worst, best and for average case. Experimental comparison of WW with existing algorithms validates our theoretical claims for searching long patterns. It further reveals that WW is also efficient for searching short patterns. [Copyright &y& Elsevier]
- Published
- 2005
- Full Text
- View/download PDF
16. Pattern Matching.
- Author
-
Navarro, Gonzalo
- Subjects
- *
AUTOMATION , *INDUSTRIAL engineering , *PATTERNMAKING , *MODELS & modelmaking , *APPROXIMATION theory , *FUNCTIONAL analysis - Abstract
An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and we want to determine how often and where it occurs in a sequence. In this paper we review the most practical techniques to find patterns of different kinds. We show how regular expressions can be searched for with general techniques, and how simpler patterns can be dealt with more simply and efficiently. We consider exact as well as approximate pattern matching. Also we cover both sequential searching, where the sequence cannot be preprocessed, and indexed searching, where we have a data structure built over the sequence to speed up the search. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
17. Detecting $k$-(Sub-)Cadences and Equidistant Subsequence Occurrences
- Author
-
Funakoshi, Mitsuru, Nakashima, Yuto, Inenaga, Shunsuke, Bannai, Hideo, Takeda, Masayuki, and Shinohara, Ayumi
- Subjects
FOS: Computer and information sciences ,050101 languages & linguistics ,cadences ,05 social sciences ,02 engineering and technology ,string algorithms ,pattern matching ,Computer Science - Data Structures and Algorithms ,bit parallelism ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Data Structures and Algorithms (cs.DS) ,subsequences - Abstract
The equidistant subsequence pattern matching problem is considered. Given a pattern string P and a text string T, we say that P is an equidistant subsequence of T if P is a subsequence of the text such that consecutive symbols of P in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield o(n²) time algorithms for finding k-(sub-)cadences and equidistant subsequences. Furthermore, O(nlog² n) and O(nlog n) time algorithms, respectively for equidistant and Abelian equidistant matching for the case |P| = 3, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints., LIPIcs, Vol. 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020), pages 12:1-12:11
- Published
- 2020
- Full Text
- View/download PDF
18. Linear Work Generation of R-MAT Graphs
- Author
-
Lorenz Hübschle-Schneider and Peter Sanders
- Subjects
FOS: Computer and information sciences ,sampling ,Discrete Mathematics (cs.DM) ,Sociology and Political Science ,Social Psychology ,Logarithm ,Alias ,parallel processing ,Computer science ,0102 computer and information sciences ,02 engineering and technology ,Topology ,01 natural sciences ,large graphs ,Matrix (mathematics) ,graph generator ,Computer Science - Data Structures and Algorithms ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Data Structures and Algorithms (cs.DS) ,Social and Information Networks (cs.SI) ,Communication ,DATA processing & computer science ,Computer Science - Social and Information Networks ,Data structure ,Computer Science - Distributed, Parallel, and Cluster Computing ,010201 computation theory & mathematics ,bit parallelism ,020201 artificial intelligence & image processing ,Node (circuits) ,Distributed, Parallel, and Cluster Computing (cs.DC) ,ddc:004 ,Constant (mathematics) ,Order of magnitude ,Computer Science - Discrete Mathematics - Abstract
R-MAT (for Recursive MATrix) is a simple, widely used model for generating graphs with a power law degree distribution, a small diameter, and communitys structure. It is particularly attractive for generating very large graphs because edges can be generated independently by an arbitrary number of processors. However, current R-MAT generators need time logarithmic in the number of nodes for generating an edge— constant time for generating one bit at a time for node IDs of the connected nodes. We achieve constant time per edge by precomputing pieces of node IDs of logarithmic length. Using an alias table data structure, these pieces can then be sampled in constant time. This simple technique leads to practical improvements by an order of magnitude. This further pushes the limits of attainable graph size and makes generation overhead negligible in most situations.
- Published
- 2019
19. Approximate search for big data with applications in information security: A survey
- Author
-
Slobodan Petrovic
- Subjects
Big Data ,information security ,Chemistry ,business.industry ,Computer science ,Digital forensics ,Big data ,approximate search ,bit-parallelism ,Intrusion detection system ,Information security ,computer.software_genre ,lcsh:HD72-88 ,lcsh:Economic growth, development, planning ,Search algorithm ,Malware ,Bit parallelism ,Data mining ,constraints ,business ,Business management ,computer - Abstract
This paper is a survey of approximate search techniques in very large data sets (so-called Big Data). After a short introduction, some techniques for speeding up approximate search in such data sets based on exploitation of inherent bit-parallelism in computers are described. It then reviews the applications in search related to information security problems (digital forensics, malware detection, intrusion detection) are reviewed. Finally, the need for constraints in approximate search regarding the number of so-called elementary edit operations and the run lengths of particular elementary edit operations is explained and the status of on-going research on efficient implementation of approximate search algorithms with various constraints is given.
- Published
- 2015
- Full Text
- View/download PDF
20. String matching with lookahead
- Author
-
Hannu Peltola and Jorma Tarhio
- Subjects
Theoretical computer science ,LOOP (programming language) ,Generalization ,Computer science ,Character (computing) ,Applied Mathematics ,Binary data ,Discrete Mathematics and Combinatorics ,Window (computing) ,Bit parallelism ,String searching algorithm ,Variation (game tree) ,Algorithm - Abstract
Forward-SBNDM is a recently introduced variation of the BNDM algorithm for exact string matching. Forward-SBNDM inspects a 2-gram in the text such that the first character is the last one of an alignment window of the pattern and the second one is then outside the window. We present a generalization of this idea by inspecting several lookahead characters beyond an alignment window and integrate it with SBNDMq, a q-gram variation of BNDM. As a result, we get several new variations of SBNDMq. In addition, we introduce a greedy skip loop for SBNDM2. We tune up our algorithms and the reference algorithms with 2-byte read. According to our experiments, the best of the new variations are faster than the winners of recent algorithm comparisons for English, DNA, and binary data.
- Published
- 2014
- Full Text
- View/download PDF
21. BIT-PARALLEL ALGORITHMS FOR THE MERGED LONGEST COMMON SUBSEQUENCE PROBLEM
- Author
-
Sebastian Deorowicz and Agnieszka Danek
- Subjects
Longest common subsequence problem ,Theoretical computer science ,Similarity (geometry) ,Longest alternating subsequence ,Computer Science (miscellaneous) ,Hunt–McIlroy algorithm ,Parallel algorithm ,Edit distance ,Bit parallelism ,Longest increasing subsequence ,Algorithm ,Mathematics - Abstract
It is often a necessity to compare some sequences to find out how similar they are. There are many similarity measures that can be used, e.g., longest common subsequence, edit distance, sequence alignment. Recently a merged longest common subsequence (MergedLCS) problem was formulated with applications in bioinformatics. We propose the bit-parallel algorithms for the MergedLCS problem and evaluate them in practice showing that they are usually tens times faster than the already published methods.
- Published
- 2013
- Full Text
- View/download PDF
22. A Simple Streaming Bit-Parallel Algorithm for Swap Pattern Matching
- Author
-
Ondřej Suchý, Tomáš Valla, and Václav Blažej
- Subjects
Deterministic finite automaton ,Swap (finance) ,Computer science ,Efficient algorithm ,Parallel algorithm ,Bit parallelism ,Pattern matching ,Algorithm ,Bitwise operation ,Blossom algorithm - Abstract
The pattern matching problem with swaps is to find all occurrences of a pattern in a text while allowing the pattern to swap adjacent symbols. The goal is to design fast matching algorithm that takes advantage of the bit parallelism of bitwise machine instructions and has only streaming access to the input. We introduce a new approach to solve this problem based on the graph theoretic model and compare its performance to previously known algorithms. We also show that an approach using deterministic finite automata cannot achieve similarly efficient algorithms. Furthermore, we describe a fatal flaw in some of the previously published algorithms based on the same model. Finally, we provide experimental evaluation of our algorithm on real-world data.
- Published
- 2017
- Full Text
- View/download PDF
23. Improving practical exact string matching
- Author
-
Branislav Ďurian, Jorma Tarhio, Hannu Peltola, and Jan Holub
- Subjects
ta113 ,State variable ,Theoretical computer science ,Parallel algorithm ,Information processing ,Process (computing) ,String searching algorithm ,Computer Science Applications ,Theoretical Computer Science ,Experimental comparison ,ta5141 ,Signal Processing ,String matching ,Point (geometry) ,Bit parallelism ,ta513 ,ta512 ,Bit-parallelism ,Algorithm ,ta515 ,Algorithms ,Information Systems ,Mathematics - Abstract
We present improved variations of the BNDM algorithm for exact string matching. At each alignment our bit-parallel algorithms process a q-gram before testing the state variable. In addition we apply reading a 2-gram in one instruction. Our point of view is practical efficiency of algorithms. Our experiments show that the new variations are faster than earlier algorithms in many cases.
- Published
- 2010
- Full Text
- View/download PDF
24. An efficient pattern matching scheme in LZW compressed sequences
- Author
-
Nai-Lun Huang and Tsern-Huei Lee
- Subjects
Computer Networks and Communications ,Computer science ,Data_CODINGANDINFORMATIONTHEORY ,String searching algorithm ,computer.file_format ,Uncompressed video ,Space requirements ,Bitmap ,Bit parallelism ,Pattern matching ,Algorithm ,computer ,Information Systems ,Compressed pattern matching - Abstract
Summary Compressed pattern matching (CPM) is an emerging research field addressing the problem: given a compressed sequence and a pattern, process the sequence with minimal (or no) decompression to find the pattern occurrence(s) in the uncompressed sequence. It can be applied to detect malwares and confidential information leakage in compressed files directly. In this paper, we report our work of CPM in Lempel–Ziv–Welch (LZW) compressed sequences. We propose an efficient bitmap-based realization of the Amir–Benson–Farach algorithm. We also generalize the algorithm to find all pattern occurrences and report their absolute positions in the uncompressed sequence. Experiments are conducted to test the space requirements of our proposed generalization and two related CPM schemes which can also be realized with bitmaps. Results show that our proposed generalization requires the least amount of storage for moderate and long patterns. We also conduct experiments to compare the throughput performance of our proposed generalization with these two related CPM schemes and the decompress-then-search scheme. Results show that our proposed generalization outperforms the decompress-then-search scheme significantly. When scanning a file with pattern occurrences, our proposed generalization performs slightly better than the two related CPM schemes. The difference is significant when scanning a file with no pattern occurrence. Copyright # 2008 John Wiley & Sons, Ltd.
- Published
- 2008
- Full Text
- View/download PDF
25. Word Size Removal (WSR) method for Bit Parallel String Matching
- Author
-
Kapil Kumar Soni and Vivek Kumar Sharma
- Subjects
Bit (horse) ,Pattern size ,Computer science ,Bit parallelism ,String searching algorithm ,Algorithm ,Word (computer architecture) - Abstract
Bit Parallel String Matching algorithms are the most efficient and latest class of String Matching Algorithms. These algorithms use the intrinsic property of computer known as Bit Parallelism for performing bit operations in parallel with-in single computer word. BNDM, TNDM, BNDMq are the popular single pattern bit parallel string matching algorithms whereas Shift-OR and Shift-OR with Q-grams algorithms are popular in multiple pattern string matching. All these algorithms are restricted by the limitation of pattern size. These all algorithms are not working on patterns whose sizes are longer than computer word. In this paper, we proposed the generic method named WSR (Word Size Removal) for bit parallel string matching. With the inclusion of WSR these bit parallel string matching algorithms can work on longer than computer word size patterns. This paper presented WSR method and all modified algorithms.
- Published
- 2015
- Full Text
- View/download PDF
26. Regular expression searching on compressed text
- Author
-
Gonzalo Navarro
- Subjects
Discrete mathematics ,Ziv–Lempel ,Regular languages ,Automata ,Automaton ,Theoretical Computer Science ,Regular language ,Computational Theory and Mathematics ,Search algorithm ,Compressed pattern matching ,Discrete Mathematics and Combinatorics ,Bit parallelism ,Regular expression ,Bit-parallelism ,Mathematics - Abstract
We present a solution to the problem of regular expression searching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text in O(2m + mn + Rm log m) worst case time. On average this drops to O(m2 + (n + Rm) log m) or O(m2 + n + Ru/n) for most regular expressions. This is the first nontrivial result for this problem. The experimental results show that our compressed search algorithm needs half the time necessary for decompression plus searching, which is currently the only alternative.
- Published
- 2003
- Full Text
- View/download PDF
27. The implementation of bit-parallelism for DNA sequence alignment
- Author
-
Dwi H. Widyantoro, Kuspriyanto, Setyorini, and Adi Pancoro
- Subjects
History ,Computer science ,Sequence alignment ,Bit parallelism ,Parallel computing ,Computer Science Applications ,Education - Published
- 2017
- Full Text
- View/download PDF
28. Fast Longest Common Subsequence with General Integer Scoring Support on GPUs
- Author
-
Martin Swany, Adnan Ozsoy, and Arun Chauhan
- Subjects
Longest common subsequence problem ,Discrete mathematics ,CUDA ,Class (computer programming) ,Theoretical computer science ,Matching (graph theory) ,Computer science ,Benchmark (computing) ,Bit parallelism ,Longest increasing subsequence ,Parallel computing ,Massively parallel ,Integer (computer science) - Abstract
Graphic Processing Units (GPUs) have been gaining popularity among high-performance users. Certain classes of algorithms benefit greatly from the massive parallelism of GPUs. One such class of algorithms is longest common subsequence (LCS). Combined with bit parallelism, recent studies have been able to achieve terascale performance for LCS on GPUs. However, the reported results for the one-to-many matching problem lack correlation with weighted scoring algorithms. In this paper, we describe a novel technique to improve the score significance of the length of LCS algorithm for multiple matching. We extend the bit-vector algorithms for LCS to include integer scoring and parallelize them for hybrid CPU-GPU platforms. We benchmark our algorithm against the well-known sequence alignment algorithm on GPUs, CUDASW++, for accuracy and report performance on three different systems.
- Published
- 2014
- Full Text
- View/download PDF
29. On Pattern Matching With Swaps
- Author
-
Fouad B. Chedid
- Subjects
Dynamic programming ,Computational complexity theory ,Computer science ,Efficient algorithm ,Approximation algorithm ,Bit parallelism ,Variation (game tree) ,Pattern matching ,Disjoint sets ,Algorithm - Abstract
Pattern Matching with Swaps (PMS for short) is a variation of the classical pattern matching problem where a match is allowed to include disjoint local swaps. In 2009, Cantone and Faro devised a new dynamic programming algorithm for PMS, named Cross-Sampling, that runs in O(nm) time and uses O(m) space. More important, Cross-Sampling admits a linear-time implementation based on bit parallelism when the pattern's size is comparable to the word size of the machine. In this paper, we present improved dynamic programming formulations of the approach of Cantone and Faro for PMS which result in simpler algorithms that are much easier to be comprehended and implemented.
- Published
- 2013
- Full Text
- View/download PDF
30. A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms
- Author
-
Thierry Lecroq and Simone Faro
- Subjects
Nondeterministic automata ,Speedup ,Theoretical computer science ,Text processing ,Efficient algorithm ,Bit parallelism ,String searching algorithm ,Algorithm ,Natural language ,Mathematics - Abstract
In this paper we present a general approach to string matching based on multiple sliding text-windows, and show how it can be applied to some among the most efficient algorithms for the problem based on nondeterministic automata and comparison of characters. From our experimental results it turns out that the new multiple sliding windows approach leads to algorithms which obtain better results than the original ones when searching texts over relatively large alphabets. Best improvements are obtained especially for short patterns.
- Published
- 2012
31. Bit-Parallel Multiple Pattern Matching
- Author
-
Tuan Tu Tran, Jean-Stéphane Varré, Mathieu Giraud, Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Bioinformatics and Sequence Analysis (BONSAI), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe, and Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
0303 health sciences ,Multi-core processor ,sequence comparison ,OpenCL ,Computer science ,GPU ,02 engineering and technology ,Parallel computing ,Extension (predicate logic) ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,03 medical and health sciences ,Task (computing) ,pattern matching ,neighborhood indexing ,020204 information systems ,Text matching ,bit parallelism ,0202 electrical engineering, electronic engineering, information engineering ,Pattern matching ,Multicore cpu ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,Heuristics ,Massively parallel ,030304 developmental biology - Abstract
International audience; Text matching with errors is a regular task in computational biology. We present an extension of the bit-parallel Wu-Manber algorithm to combine several searches for a pattern into a collection of fixed-length words. We further present an OpenCL parallelization of a redundant index on massively parallel multicore processors, within a framework of searching for similarities with seed-based heuristics. We successfully implemented and ran our algorithms on GPU and multicore CPU. Some speedups obtained are more than 60x.
- Published
- 2011
32. Optimal Packed String Matching
- Author
-
Ben-Kiki, Oren, Bille, Philip, Breslauer, Dany, Gasieniec, Leszek, Grossi, Roberto, Weimann, Oren, Chakraborty, Supratik, and Kumar, Amit
- Subjects
Space efficiency ,Real time ,000 Computer science, knowledge, general works ,010201 computation theory & mathematics ,Computer Science ,0202 electrical engineering, electronic engineering, information engineering ,Bit parallelism ,020201 artificial intelligence & image processing ,String matching ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences - Abstract
In the packed string matching problem, each machine word accomodates alpha characters, thus an n-character text occupies n/alpha memory words. We extend the Crochemore-Perrin constant-space O(n)-time string matching algorithm to run in optimal O(n/alpha) time and even in real-time, achieving a factor alpha speedup over traditional algorithms that examine each character individually. Our solution can be efficiently implemented, unlike prior theoretical packed string matching work. We adapt the standard RAM model and only use its AC0 instructions (i.e. no multiplication) plus two specialized AC0 packed string instructions. The main string-matching instruction is available in commodity processors (i.e. Intel's SSE4.2 and AVX Advanced String Operations); the other maximal-suffix instruction is only required during pattern preprocessing. In the absence of these two specialized instructions, we propose theoretically-efficient emulation using integer multiplication (not AC0) and table lookup.
- Published
- 2011
- Full Text
- View/download PDF
33. Filtering Based Multiple String Matching Algorithm Combining q-Grams and BNDM
- Author
-
Guiran Chang, Xingwei Wang, and Changsheng Miao
- Subjects
Statistical classification ,Computer science ,Preprocessor ,Approximation algorithm ,Algorithm design ,Bit parallelism ,String searching algorithm ,Pattern matching ,Algorithm - Abstract
We present a new algorithm for exact multiple string matching. Our algorithm is based on filtration combining BNDM and q-grams. We have tested it with experiments and compared it with other algorithms, e.g. DFA, AC_BM and MWM. The preprocessing phase of our algorithms is fast, the memory usage is fairly small, and our algorithm is considerably faster for huge sets of several thousand patterns. The benefits are due to the bit parallelism of BNDM and the improved filtering efficiency by q-grams.
- Published
- 2010
- Full Text
- View/download PDF
34. Bit-Parallelism$^2$: Getting to the Next Level of Parallelism
- Author
-
Domenico Cantone, Emanuele Giaquinta, and Simone Faro
- Subjects
Data parallelism ,Computer science ,string matching ,design and analysis of algorithms ,Task parallelism ,String searching algorithm ,Parallel computing ,instruction-level parallelism ,Scalable parallelism ,Memory-level parallelism ,bit parallelism ,text processing ,Implicit parallelism ,Instruction-level parallelism - Abstract
We investigate the problem of getting to a higher instruction-level parallelism in string matching algorithms. In particular, starting from an algorithm based on bit-parallelism, we propose two flexible approaches for boosting it with a higher level of parallelism. These approaches are general enough to be applied to other bit-parallel algorithms. It turns out that higher levels of parallelism lead to more efficient solutions in practical cases, as demonstrated by an extensive experimentation.
- Published
- 2010
35. Fast text searching
- Author
-
Udi Manber and Sun Wu
- Subjects
Data processing ,General Computer Science ,Bitap algorithm ,Search algorithm ,Computer science ,Bit parallelism ,Pattern matching ,Approximate string matching ,Algorithm ,Text searching ,Compressed pattern matching - Published
- 1992
- Full Text
- View/download PDF
36. BPBM: An Algorithm for String Matching with Wildcards and Length Constraints
- Author
-
Yingling Liu, Jun Gao, Xuegang Hu, Gongqing Wu, Xiao-Li Hong, and Xindong Wu
- Subjects
Sequence ,Matching (graph theory) ,business.industry ,Commentz-Walter algorithm ,Scale (descriptive set theory) ,Pattern recognition ,Wildcard character ,computer.file_format ,String searching algorithm ,Bit parallelism ,Pattern matching ,Artificial intelligence ,business ,Algorithm ,computer ,Mathematics - Abstract
Pattern matching with wildcards and length constraints under the one-off condition is a challenging topic. We propose an algorithm BPBM, based on bit parallelism and the Boyer-Moore algorithm, that outputs an occurrence of a given pattern P as soon as the pattern appears in the given sequence. The experimental results show that our BPBM algorithm has an improved time performance of over 50% with the same matching results when compared with SAIL, a state-of-the-art algorithm of this matching problem. The superiority is even more remarkable when the scale of the pattern increases.
- Published
- 2009
- Full Text
- View/download PDF
37. Multipattern string matching with q -grams
- Author
-
Leena Salmela, Jorma Tarhio, and Jari Kytöjoki
- Subjects
Trie ,Commentz-Walter algorithm ,Bit parallelism ,String searching algorithm ,Intrusion detection system ,Approximate string matching ,Algorithm ,Theoretical Computer Science ,Mathematics - Abstract
We present three algorithms for exact string matching of multiple patterns. Our algorithms are filtering methods, which apply q -grams and bit parallelism. We ran extensive experiments with them and compared them with various versions of earlier algorithms, e.g., different trie implementations of the Aho--Corasick algorithm. All of our algorithms appeared to be substantially faster than earlier solutions for sets of 1,000--10,000 patterns and the good performance of two of them continues to 100,000 patterns. The gain is because of the improved filtering efficiency caused by q -grams.
- Published
- 2007
- Full Text
- View/download PDF
38. Optimal Packed String Matching
- Author
-
Chakraborty, Supratik, Kumar, Amit, Ben-Kiki, Oren, Bille, Philip, Breslauer, Dany, Gasieniec, Leszek, Grossi, Roberto, Weimann, Oren, Chakraborty, Supratik, Kumar, Amit, Ben-Kiki, Oren, Bille, Philip, Breslauer, Dany, Gasieniec, Leszek, Grossi, Roberto, and Weimann, Oren
- Abstract
In the packed string matching problem, each machine word accommodates – characters, thus an n-character text occupies n/– memory words. We extend the Crochemore-Perrin constantspace O(n)-time string matching algorithm to run in optimal O(n/–) time and even in real-time, achieving a factor – speedup over traditional algorithms that examine each character individually. Our solution can be efficiently implemented, unlike prior theoretical packed string matching work. We adapt the standard RAM model and only use its AC0 instructions (i.e., no multiplication) plus two specialized AC0 packed string instructions. The main string-matching instruction is available in commodity processors (i.e., Intel’s SSE4.2 and AVX Advanced String Operations); the other maximal-suffix instruction is only required during pattern preprocessing. In the absence of these two specialized instructions, we propose theoretically-efficient emulation using integer multiplication (not AC0) and table lookup.
- Published
- 2011
39. Optimal Packed String Matching
- Author
-
Oren Ben-Kiki and Philip Bille and Dany Breslauer and Leszek Gasieniec and Roberto Grossi and Oren Weimann, Ben-Kiki, Oren, Bille, Philip, Breslauer, Dany, Gasieniec, Leszek, Grossi, Roberto, Weimann, Oren, Oren Ben-Kiki and Philip Bille and Dany Breslauer and Leszek Gasieniec and Roberto Grossi and Oren Weimann, Ben-Kiki, Oren, Bille, Philip, Breslauer, Dany, Gasieniec, Leszek, Grossi, Roberto, and Weimann, Oren
- Abstract
In the packed string matching problem, each machine word accomodates alpha characters, thus an n-character text occupies n/alpha memory words. We extend the Crochemore-Perrin constant-space O(n)-time string matching algorithm to run in optimal O(n/alpha) time and even in real-time, achieving a factor alpha speedup over traditional algorithms that examine each character individually. Our solution can be efficiently implemented, unlike prior theoretical packed string matching work. We adapt the standard RAM model and only use its AC0 instructions (i.e. no multiplication) plus two specialized AC0 packed string instructions. The main string-matching instruction is available in commodity processors (i.e. Intel's SSE4.2 and AVX Advanced String Operations); the other maximal-suffix instruction is only required during pattern preprocessing. In the absence of these two specialized instructions, we propose theoretically-efficient emulation using integer multiplication (not AC0) and table lookup.
- Published
- 2011
- Full Text
- View/download PDF
40. Multi-Dimensional Packet Classification Using Tuple Space Based on Bit-Parallelism
- Author
-
Wang
- Subjects
Computer science ,Multi dimensional ,Tuple space ,Bit parallelism ,Parallel computing ,Packet classification - Published
- 2006
- Full Text
- View/download PDF
41. Efficient Bit-Parallel Algorithms for (δ,α)-Matching
- Author
-
Szymon Grabowski and Kimmo Fredriksson
- Subjects
Combinatorics ,Matching (graph theory) ,Parallel algorithm ,Bit parallelism ,String searching algorithm ,Pattern matching ,Approximate string matching ,Algorithm ,Word (group theory) ,Substring ,Mathematics - Abstract
We consider the following string matching problem. Pattern p0p1p2 ... pm−−1 (δ,α)-matches the text substring $t_{i_0} t_{i_1} t_{i_2} \ldots t_{i_{m-1}}$, if $|p_j-t_{i_j}|\leq\delta$ for j ∈ {0,..., m–1}, where 0 < ij+1 – ij ≤ α + 1. The task is then to find all text positions im−−1 that (δ,α)-match the pattern. For a text of length n, the best previously known algorithms for this string matching problem run in time O(nm) and in time O(n⌈mα/w⌉), where the former is based on dynamic programming, and the latter on bit-parallelism with w bits in computer word (32 or 64 typically). We improve these to take O(nδ+⌈n/w⌉m) and O(n ⌈m log(α)/w⌉), respectively, worst case time using bit-parallelism. On average the algorithms run in O(⌈n/w⌉⌈αδ/σ⌉+n)and O(n) time. Our experimental results show that the algorithms work extremely well in practice. Our algorithms handle general gaps as well, having important applications in computational biology.
- Published
- 2006
- Full Text
- View/download PDF
42. Increased bit-parallelism for approximate and multiple string matching
- Author
-
Kimmo Fredriksson, Heikki Hyyrö, and Gonzalo Navarro
- Subjects
Combinatorics ,Boosting (machine learning) ,Speedup ,Computation ,Commentz-Walter algorithm ,Bit parallelism ,Edit distance ,String searching algorithm ,Approximate string matching ,Theoretical Computer Science ,Mathematics - Abstract
Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O (⌈ m / w ⌉ n ), where w is the number of bits in the computer word. Although this is asymptotically the optimal bit-parallel speedup over the basic O ( mn ) time algorithm, it wastes bit-parallelism's power in the common case where m is much smaller than w , since w − m bits in the computer words are unused. In this paper, we explore different ways to increase the bit-parallelism when the search pattern is short. First, we show how multiple patterns can be packed into a single computer word so as to search for all them simultaneously. Instead of spending O ( rn ) time to search for r patterns of length m ≤ w /2, we need O (⌈ rm / w ⌉ n ) time. Second, we show how the mechanism permits boosting the search for a single pattern of length m ≤ w /2, which can be searched for in O (⌈ n /⌊ w / m ⌋⌉) bit-parallel steps instead of O ( n ). Third, we show how to extend these algorithms so that the time bounds essentially depend on k instead of m , where k is the maximum number of differences permitted. Finally, we show how the ideas can be applied to other problems such as multiple exact string matching and one-against-all computation of edit distance and longest common subsequences. Our experimental results show that the new algorithms work well in practice, obtaining significant speedups over the best existing alternatives, especially on short patterns and moderate number of differences allowed. This work fills an important gap in the field, where little work has focused on very short patterns.
- Published
- 2005
- Full Text
- View/download PDF
43. Increased Bit-Parallelism for Approximate String Matching
- Author
-
Gonzalo Navarro, Kimmo Fredriksson, and Heikki Hyyrö
- Subjects
Set (abstract data type) ,Speedup ,Bit parallelism ,Approximate string matching ,Algorithm ,Word (computer architecture) ,Power (physics) ,Mathematics - Abstract
Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(⌈m/w⌉n), where w is the number of bits in the computer word. Although this is asymptotically the optimal speedup over the basic O(mn) time algorithm, it wastes bit-parallelism’s power in the common case where m is much smaller than w, since w-m bits in the computer words get unused.
- Published
- 2004
- Full Text
- View/download PDF
44. Transposition invariant pattern matching for multi-track strings
- Author
-
Kjell Lemström, Jorma Tarhio, and Department of Computer Science
- Subjects
String algorithms ,combinatorial pattern matching ,music retrieval ,education ,bit parallelism ,113 Computer and information sciences - Published
- 2003
45. Bit-parallel approach to approximate string matching in compressed texts
- Author
-
Takuya Kida, Masayuki Takeda, Setsuo Arikawa, Tetsuya Matsumoto, and Ayumi Shinohara
- Subjects
Combinatorics ,Computer science ,Compression (functional analysis) ,Bit parallelism ,String searching algorithm ,Approximate string matching ,Time complexity ,Text string ,Algorithm ,Word (group theory) - Abstract
Addresses the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of a collage system, which is a formal system proposed by T. Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word, e.g. (m-k)(k+1)/spl les/L, where m is the pattern length, k is the number of allowed errors and L is the length, in bits, of the machine word. For a class of simple collage systems, the algorithm runs in O(k/sup 2/(/spl par//spl Dscr//spl par/+|/spl Sscr/|)+km) time using O(k/sup 2//spl par//spl Dscr//spl par/) space, where /spl par//spl Dscr//spl par/ is the size of dictionary /spl Dscr/ and |/spl Sscr/| is the number of tokens in the sequence /spl Sscr/. The LZ78 (Lempel-Ziv, 1978) and the LZW (Lempel-Ziv-Welch, 1984) compression methods are covered by this class. Since we can regard n=/spl par//spl Dscr//spl par/+|/spl Sscr/| as the compressed length, the time and space complexities are O(k/sup 2/n+km) and O(k/sup 2/n), respectively. For general k and m, they become O(k/sup 3/mn/L+km) and O(k/sup 3/mn/L). Thus, our algorithm is competitive to the algorithm proposed by J. Ka/spl uml/rkka/spl uml/inen, et al. (2000), which runs in O(km) time using O(kmn) space, when k=O(/spl radic/L).
- Published
- 2000
46. Fast and flexible string matching by combining bit-parallelism and suffix automata
- Author
-
Gonzalo Navarro and Mathieu Raffinot
- Subjects
Nondeterministic algorithm ,Theoretical computer science ,Computer science ,Suffix automaton ,Bit parallelism ,String searching algorithm ,Pattern matching ,Alphabet ,Suffix ,Algorithm ,Theoretical Computer Science ,Automaton - Abstract
The most important features of a string matching algorithm are its efficiency and its flexibility. Efficiency has traditionally received more attention, while flexibility in the search pattern is becoming a more and more important issue. Most classical string matching algorithms are aimed at quickly finding an exact pattern in a text, being Knuth-Morris-Pratt (KMP) and the Boyer-Moore (BM) family the most famous ones. A recent development uses deterministic "suffix automata" to design new optimal string matching algorithms, e.g. BDM and TurboBDM. Flexibility has been addressed quite separately by the use of "bit-parallelism", which simulates automata in their nondeterministic form by using bits and exploiting the intrinsic parallelism inside the computer word, e.g. the Shift-Or algorithm. Those algorithms are extended to handle classes of characters and errors in the pattern and/or in the text, their drawback being their inability to skip text characters. In this paper we merge bit-parallelism and suffix automata, so that a nondeterministic suffix automaton is simulated using bit-parallelism. The resulting algorithm, called BNDM, obtains the best from both worlds. It is much simpler to implement than BDM and nearly as simple as Shift-Or. It inherits from Shift-Or the ability to handle flexible patterns and from BDM the ability to skip characters. BNDM is 30%-40% faster than BDM and up to 7 times faster than Shift-Or. When compared to the fastest existing algorithms on exact patterns (which belong to the BM family), BNDM is from 20% slower to 3 times faster, depending on the alphabet size. With respect to flexible pattern searching, BNDM is by far the fastest technique to deal with classes of characters and is competitive to search allowing errors. In particular, BNDM seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area. As a theoretical development related to flexible pattern matching, we introduce a new automaton to recognize suffixes of patterns with classes of characters. To the best of our knowledge, this automaton has not been studied before.
- Published
- 2000
- Full Text
- View/download PDF
47. A compact representation of nondeterministic (suffix) automata for the bit-parallel approach
- Author
-
Emanuele Giaquinta, Simone Faro, and Domenico Cantone
- Subjects
Theoretical computer science ,TheoryofComputation_COMPUTATIONBYABSTRACTDEVICES ,Computer science ,Generalized suffix tree ,Timed automaton ,Bit parallelism ,Compact representation ,Nondeterministic automata ,Shift-and ,Suffix automata ,Theoretical Computer Science ,Nondeterministic finite automaton with ε-moves ,Deterministic automaton ,Nondeterministic finite automaton ,Generalized nondeterministic finite automaton ,Mathematics ,String (computer science) ,Automaton ,Computer Science Applications ,Nondeterministic algorithm ,Deterministic finite automaton ,Computational Theory and Mathematics ,Suffix automaton ,Suffix ,Algorithm ,Computer Science::Formal Languages and Automata Theory ,Information Systems - Abstract
We present a novel technique, suitable for bit-parallelism, for representing both the nondeterministic automaton and the nondeterministic suffix automaton of a given string in a more compact way. Our approach is based on a particular factorization of strings which on the average allows to pack in a machine word of w bits automata state configurations for strings of length greater than w. We adapted the Shift-And and BNDM algorithms using our encoding and compared them with the original algorithms. Experimental results show that the new variants are generally faster for long patterns.
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.