50 results on '"Shortest Common Superstring"'
Search Results
2. All Instantiations of the Greedy Algorithm for the Shortest Common Superstring Problem are Equivalent
- Author
-
Nikolaev, Maksim S., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lecroq, Thierry, editor, and Touzet, Hélène, editor
- Published
- 2021
- Full Text
- View/download PDF
3. Greedy Shortest Common Superstring Approximation in Compact Space
- Author
-
Alanko, Jarno, Norri, Tuukka, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Fici, Gabriele, editor, Sciortino, Marinella, editor, and Venturini, Rossano, editor
- Published
- 2017
- Full Text
- View/download PDF
4. Optimization Problems in Molecular Biology
- Author
-
Jiang, Tao, Li, Ming, Pardalos, Panos, editor, Horst, Reiner, editor, Du, Ding-Zhu, editor, and Sun, Jie, editor
- Published
- 1994
- Full Text
- View/download PDF
5. A history of DNA sequence assembly.
- Author
-
Myers Jr, Eugene W.
- Subjects
NUCLEOTIDE sequencing ,SHOTGUN sequencing ,DE Bruijn graph ,HISTORY - Abstract
DNA sequence assembly is a rich combinatorial problem that arose with the first DNA sequencing projects in the early 80's. Here we give a short history of the progression of algorithmic ideas used to solve the de novo problem of inferring a genome given a large sampling of substrings covering it. This classic inverse problem is compounded by a variety of experimental features and artifacts that must be considered in any realistic solution. While current methods produce very good results, the perfect assembler has yet to be built. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
6. Restricted and Swap Common Superstring: A Multivariate Algorithmic Perspective.
- Author
-
Bonizzoni, Paola, Dondi, Riccardo, Mauri, Giancarlo, and Zoppis, Italo
- Subjects
- *
SUPERSTRING theories , *MULTIVARIATE analysis , *ALGORITHMS , *BIOINFORMATICS , *SET theory - Abstract
In several areas, for example in bioinformatics and in AI planning, the Shortest Common Superstring problem (SCS) and variants thereof have been successfully applied for string comparison. In this paper we consider two variants of SCS recently introduced, namely Restricted Common Superstring (RCS) and Swapped Common Superstring (SRCS). In RCS we are given a set $$S$$ of strings and a multiset $$\mathcal {M}$$ of symbols, and we look for an ordering $$\mathcal {M}_o$$ of $$\mathcal {M}$$ such that the number of input strings which are substrings of $$\mathcal {M}_o$$ is maximized. In SRCS we are given a set $$S$$ of strings and a text $$\mathcal {T}$$ , and we look for a swap ordering $$\mathcal {T}_o$$ of $$\mathcal {T}$$ (an ordering of $$\mathcal {T}$$ obtained by swapping only some pairs of adjacent symbols) such that the number of input strings which are substrings of $$\mathcal {T}_o$$ is maximized. In this paper we propose a multivariate algorithmic analysis of the complexity of the two problems, aiming at determining how different parameters influence the complexity of the two problems. We consider as interesting parameters the size of the solutions (that is the number of input strings contained in the computed superstring), the maximum length of the given input strings, the size of the alphabet over which the input strings range. First, we give two fixed-parameter algorithms, where the parameter is the size of the solution, for SRCS and lRCS (the RCS problem restricted to strings of length bounded by a parameter $$\ell $$ ). Furthermore, we complement these results by showing that SRCS and lRCS do not admit a polynomial kernel unless $$NP \subseteq coNP/Poly$$ . Then, we show that SRCS is APX-hard even when the input strings have length bounded by a constant (equal to $$10$$ ) or are over a binary alphabet. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
7. All Instantiations of the Greedy Algorithm for the Shortest Common Superstring Problem are Equivalent
- Author
-
Maksim S. Nikolaev
- Subjects
Set (abstract data type) ,Combinatorics ,Constant factor ,High Energy Physics::Theory ,Shortest common superstring ,Superstring theory ,Greedy algorithm ,Mathematics ,Merge (linguistics) - Abstract
In the Shortest Common Superstring problem (SCS), one needs to find the shortest superstring for a set of strings. While SCS is NP-hard and MAX-SNP-hard, the Greedy Algorithm “choose two strings with the largest overlap; merge them; repeat” achieves a constant factor approximation that is known to be at most 3.5 and conjectured to be equal to 2. The Greedy Algorithm is not deterministic, so its instantiations with different tie-breaking rules may have different approximation factors. In this paper, we show that it is not the case: all factors are equal. To prove this, we show how to transform a set of strings so that all overlaps are different whereas their ratios stay roughly the same.
- Published
- 2021
8. Local search for string problems: Brute-force is essentially optimal.
- Author
-
Guo, Jiong, Hermelin, Danny, and Komusiewicz, Christian
- Subjects
- *
SEARCH algorithms , *DATA encryption , *OPTIMAL control theory , *HAMMING codes , *PROBLEM solving , *COMPUTER algorithms - Abstract
Abstract: We address the problem of whether the brute-force procedure for the local improvement step in a local search algorithm can substantially be improved when applied to classical NP-hard string problems. We examine four of the more prominent problems in this domain: Closest String, Longest Common Subsequence, Shortest Common Supersequence, and Shortest Common Superstring. Herein, we consider arguably the most fundamental string distance measure, namely the Hamming distance, which has been applied in practical local search implementations for string problems. Our results indicate that for all four problems, the brute-force algorithm cannot be considerably improved. [Copyright &y& Elsevier]
- Published
- 2014
- Full Text
- View/download PDF
9. A probabilistic PTAS for shortest common superstring.
- Author
-
Plociennik, Kai
- Subjects
- *
SUPERSTRING theories , *APPROXIMATION algorithms , *MATHEMATICAL constants , *SCHEME programming language , *POLYNOMIAL time algorithms , *PROBABILITY theory - Abstract
Abstract: We consider approximation algorithms for the shortest common superstring problem (SCS). It is well known that there is a constant such that there is no efficient approximation algorithm for SCS achieving a factor of at most f in the worst case, unless . We study SCS on random inputs and present an approximation scheme that achieves, for every , a -approximation in expected polynomial time. We also show that the greedy algorithm achieves approximation ratio with probability exponentially close to 1. These results apply not only if the letters are chosen independently at random, but also to the more realistic mixing model, which allows dependencies among the letters of the random strings. Our results are based on a sharp tail bound on the optimal compression, which improves a previous result by Frieze and Szpankowski (1998). [Copyright &y& Elsevier]
- Published
- 2014
- Full Text
- View/download PDF
10. Collapsing Superstring Conjecture
- Author
-
Alexander Golovnev and Alexander S. Kulikov and Alexander Logunov and Ivan Mihajlin and Maksim Nikolaev, Golovnev, Alexander, Kulikov, Alexander S., Logunov, Alexander, Mihajlin, Ivan, Nikolaev, Maksim, Alexander Golovnev and Alexander S. Kulikov and Alexander Logunov and Ivan Mihajlin and Maksim Nikolaev, Golovnev, Alexander, Kulikov, Alexander S., Logunov, Alexander, Mihajlin, Ivan, and Nikolaev, Maksim
- Abstract
In the Shortest Common Superstring (SCS) problem, one is given a collection of strings, and needs to find a shortest string containing each of them as a substring. SCS admits 2 11/23-approximation in polynomial time (Mucha, SODA'13). While this algorithm and its analysis are technically involved, the 30 years old Greedy Conjecture claims that the trivial and efficient Greedy Algorithm gives a 2-approximation for SCS. We develop a graph-theoretic framework for studying approximation algorithms for SCS. The framework is reminiscent of the classical 2-approximation for Traveling Salesman: take two copies of an optimal solution, apply a trivial edge-collapsing procedure, and get an approximate solution. In this framework, we observe two surprising properties of SCS solutions, and we conjecture that they hold for all input instances. The first conjecture, that we call Collapsing Superstring conjecture, claims that there is an elementary way to transform any solution repeated twice into the same graph G. This conjecture would give an elementary 2-approximate algorithm for SCS. The second conjecture claims that not only the resulting graph G is the same for all solutions, but that G can be computed by an elementary greedy procedure called Greedy Hierarchical Algorithm. While the second conjecture clearly implies the first one, perhaps surprisingly we prove their equivalence. We support these equivalent conjectures by giving a proof for the special case where all input strings have length at most 3 (which until recently had been the only case where the Greedy Conjecture was proven). We also tested our conjectures on millions of instances of SCS. We prove that the standard Greedy Conjecture implies Greedy Hierarchical Conjecture, while the latter is sufficient for an efficient greedy 2-approximate approximation of SCS. Except for its (conjectured) good approximation ratio, the Greedy Hierarchical Algorithm provably finds a 3.5-approximation, and finds exact solutions for the
- Published
- 2019
- Full Text
- View/download PDF
11. Reoptimization of the Shortest Common Superstring Problem.
- Author
-
Bilò, Davide, Böckenhauer, Hans-Joachim, Komm, Dennis, Královič, Richard, Mömke, Tobias, Seibert, Sebastian, and Zych, Anna
- Subjects
- *
SUPERSTRING theories , *APPROXIMATION algorithms , *ITERATIVE methods (Mathematics) , *MATHEMATICAL optimization , *PERMUTATIONS - Abstract
A reoptimization problem describes the following scenario: given an instance of an optimization problem together with an optimal solution for it, we want to find a good solution for a locally modified instance. In this paper, we deal with reoptimization variants of the shortest common superstring problem (SCS) where the local modifications consist of adding or removing a single string. We show the NP-hardness of these reoptimization problems and design several approximation algorithms for them. First, we use a technique of iteratively using any SCS algorithm to design an approximation algorithm for the reoptimization variant of adding a string whose approximation ratio is arbitrarily close to 8/5 and another algorithm for deleting a string with a ratio tending to 13/7. Both algorithms significantly improve over the best currently known SCS approximation ratio of 2.5. Additionally, this iteration technique can be used to design an improved SCS approximation algorithm (without reoptimization) if the input instance contains a long string, which might be of independent interest. However, these iterative algorithms are relatively slow. Thus, we present another, faster approximation algorithm for inserting a string which is based on cutting the given optimal solution and achieves an approximation ratio of 11/6. Moreover, we give some lower bounds on the approximation ratio which can be achieved by algorithms that use such cutting strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
12. Why greed works for shortest common superstring problem
- Author
-
Ma, Bin
- Subjects
- *
SUPERSTRING theories , *STRING models (Physics) , *NUCLEOTIDE sequence , *POLYNOMIALS , *APPROXIMATION theory , *SCHEMES (Algebraic geometry) , *PERTURBATION theory - Abstract
Abstract: The shortest common superstring problem (SCS) has been extensively studied for its applications in string compression and DNA sequence assembly. Although the problem is known to be Max-SNP hard, the simple greedy algorithm performs extremely well in practice. To explain the good performance, previous researchers proved that the greedy algorithm is asymptotically optimal on random instances. Unfortunately, the practical instances in DNA sequence assembly are very different from the random instances. In this paper we explain the good performance of the greedy algorithm by using the smoothed analysis. We show that, for any given instance of SCS, the average approximation ratio of the greedy algorithm on a small random perturbation of is . The perturbation defined in the paper is small and naturally represents the mutations of the DNA sequence during evolution. Due to the existence of the uncertain nucleotides in the output of a DNA sequencing machine, we also proposed the shortest common superstring with wildcards problem (SCSW). We prove that in the worst case SCSW cannot be approximated within the ratio , while the greedy algorithm still has smoothed approximation ratio. [Copyright &y& Elsevier]
- Published
- 2009
- Full Text
- View/download PDF
13. Coevolving solutions to the shortest common superstring problem
- Author
-
Zaritsky, Assaf and Sipper, Moshe
- Subjects
- *
ALGORITHMS , *COEVOLUTION , *SUPERSTRING theories , *STRING models (Physics) - Abstract
The shortest common superstring (SCS) problem, known to be NP-Complete, seeks the shortest string that contains all strings from a given set. In this paper we compare four approaches for finding solutions to the SCS problem: a standard genetic algorithm, a novel cooperative-coevolutionary algorithm, a benchmark greedy algorithm, and a parallel coevolutionary-greedy approach. We show the coevolutionary approach produces the best results, and discuss directions for future research. [Copyright &y& Elsevier]
- Published
- 2004
- Full Text
- View/download PDF
14. A history of DNA sequence assembly
- Author
-
Eugene W. Myers
- Subjects
0301 basic medicine ,General Computer Science ,Shotgun sequencing ,Computer science ,Nucleic acid sequence ,Computational biology ,De Bruijn graph ,DNA sequencing ,03 medical and health sciences ,symbols.namesake ,030104 developmental biology ,Shortest common superstring ,String graph ,symbols - Abstract
DNA sequence assembly is a rich combinatorial problem that arose with the first DNA sequencing projects in the early 80's. Here we give a short history of the progression of algorithmic ideas used to solve the de novo problem of inferring a genome given a large sampling of substrings covering it. This classic inverse problem is compounded by a variety of experimental features and artifacts that must be considered in any realistic solution. While current methods produce very good results, the perfect assembler has yet to be built.
- Published
- 2016
15. A linear-time algorithm for finding approximate shortest common superstrings.
- Author
-
Ukkonen, Esko
- Abstract
Approximate shortest common superstrings for a given set R of strings can be constructed by applying the greedy heuristics for finding a longest Hamiltonian path in the weighted graph that represents the pairwise overlaps between the strings in R. We develop an efficient implementation of this idea using a modified Aho-Corasick string-matching automaton. The resulting common superstring algorithm runs in time O( n) or in time O( n min(log m, log¦Σ¦)) depending on whether or not the goto transitions of the Aho-Corasick automaton can be implemented by direct indexing over the alphabet Σ. Here n is the total length of the strings in R and m is the number of such strings. The best previously known method requires time O( n log m) or O( n log n) depending on the availability of direct indexing. [ABSTRACT FROM AUTHOR]
- Published
- 1990
- Full Text
- View/download PDF
16. Greedy Shortest Common Superstring Approximation in Compact Space
- Author
-
Jarno Alanko, Tuukka Norri, Fici, Gabriele, Sciortino, Marinella, Venturini, Rosssano, Department of Computer Science, Genome-scale Algorithmics research group / Veli Mäkinen, and Algorithmic Bioinformatics
- Subjects
FOS: Computer and information sciences ,0301 basic medicine ,Discrete mathematics ,Burrows–Wheeler transform ,compact ,shortest common superstring ,Sigma ,Approximation algorithm ,Order (ring theory) ,16. Peace & justice ,Space (mathematics) ,113 Computer and information sciences ,String (physics) ,Combinatorics ,03 medical and health sciences ,High Energy Physics::Theory ,030104 developmental biology ,Compact space ,BWT ,Computer Science - Data Structures and Algorithms ,Data Structures and Algorithms (cs.DS) ,greedy ,Greedy algorithm ,approximation ,Mathematics - Abstract
Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in $O(n \log \sigma)$ time and bits of space, where $n$ is the total length of the input strings in characters, and $\sigma$ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly $5 n \log \sigma$ bits of space and reasonable time for a real dataset that consists of DNA fragments., Comment: 13 Pages, 3 figures, accepted to the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017)
- Published
- 2017
17. Solving SCS for bounded length strings in fewer than steps
- Author
-
Alexander Golovnev, Alexander S. Kulikov, and Ivan Mihajlin
- Subjects
Discrete mathematics ,Connected component ,Polynomial ,Travelling salesman problem ,Graph ,Computer Science Applications ,Theoretical Computer Science ,Combinatorics ,Exact algorithm ,Shortest common superstring ,Bounded function ,Signal Processing ,Constant (mathematics) ,Information Systems ,Mathematics - Abstract
It is still not known whether a shortest common superstring (SCS) of n input strings can be found faster than in O ⁎ ( 2 n ) time ( O ⁎ ( ⋅ ) suppresses polynomial factors of the input length). In this short note, we show that for any constant r, SCS for strings of length at most r can be solved in time O ⁎ ( 2 ( 1 − c ( r ) ) n ) where c ( r ) = ( 1 + 2 r 2 ) − 1 . For this, we introduce so-called hierarchical graphs that allow us to reduce SCS on strings of length at most r to the directed rural postman problem on a graph with at most k = ( 1 − c ( r ) ) n weakly connected components. One can then use a recent O ⁎ ( 2 k ) time algorithm by Gutin, Wahlstrom, and Yeo.
- Published
- 2014
18. On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals
- Author
-
Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Gabriele Fici, Tomasz Waleń, Fici, G., Kociumaka, T., Radoszewski, J., Rytter, W., and Waleń, T.
- Subjects
FOS: Computer and information sciences ,0102 computer and information sciences ,02 engineering and technology ,Information System ,01 natural sciences ,String (physics) ,Theoretical Computer Science ,Combinatorics ,High Energy Physics::Theory ,Analysis of algorithm ,Greedy algorithm ,Computer Science - Data Structures and Algorithms ,0202 electrical engineering, electronic engineering, information engineering ,Data Structures and Algorithms (cs.DS) ,Finite set ,Analysis of algorithms ,Mathematics ,Superstring theory ,Shortest Common Superstring ,Computer Science Applications1707 Computer Vision and Pattern Recognition ,Computer Science Applications ,Reversal ,Shortest Path Faster Algorithm ,010201 computation theory & mathematics ,Compression ratio ,Signal Processing ,020201 artificial intelligence & image processing ,K shortest path routing ,Information Systems - Abstract
We study a variation of the classical Shortest Common Superstring (SCS) problem in which a shortest superstring of a finite set of strings $S$ is sought containing as a factor every string of $S$ or its reversal. We call this problem Shortest Common Superstring with Reversals (SCS-R). This problem has been introduced by Jiang et al., who designed a greedy-like algorithm with length approximation ratio $4$. In this paper, we show that a natural adaptation of the classical greedy algorithm for SCS has (optimal) compression ratio $\frac12$, i.e., the sum of the overlaps in the output string is at least half the sum of the overlaps in an optimal solution. We also provide a linear-time implementation of our algorithm., Published in Information Processing Letters
- Published
- 2015
19. Why greed works for shortest common superstring problem
- Author
-
Bin Ma
- Subjects
General Computer Science ,Approximation algorithm ,Smoothed analysis ,020206 networking & telecommunications ,Wildcard character ,0102 computer and information sciences ,02 engineering and technology ,computer.file_format ,01 natural sciences ,Polynomial-time approximation scheme ,Theoretical Computer Science ,Combinatorics ,Asymptotically optimal algorithm ,010201 computation theory & mathematics ,0202 electrical engineering, electronic engineering, information engineering ,Polynomial time approximation scheme ,Shortest common superstring ,Greedy algorithm ,computer ,Time complexity ,Greedy randomized adaptive search procedure ,Mathematics ,Computer Science(all) - Abstract
The shortest common superstring problem (SCS) has been extensively studied for its applications in string compression and DNA sequence assembly. Although the problem is known to be Max-SNP hard, the simple greedy algorithm performs extremely well in practice. To explain the good performance, previous researchers proved that the greedy algorithm is asymptotically optimal on random instances. Unfortunately, the practical instances in DNA sequence assembly are very different from the random instances. In this paper we explain the good performance of the greedy algorithm by using the smoothed analysis. We show that, for any given instance I of SCS, the average approximation ratio of the greedy algorithm on a small random perturbation of I is 1+o(1). The perturbation defined in the paper is small and naturally represents the mutations of the DNA sequence during evolution. Due to the existence of the uncertain nucleotides in the output of a DNA sequencing machine, we also proposed the shortest common superstring with wildcards problem (SCSW). We prove that in the worst case SCSW cannot be approximated within the ratio n^1^/^7^-^@e, while the greedy algorithm still has 1+o(1) smoothed approximation ratio.
- Published
- 2009
- Full Text
- View/download PDF
20. Restricted and Swap Common Superstring: A Multivariate Algorithmic Perspective
- Author
-
Giancarlo Mauri, Riccardo Dondi, Paola Bonizzoni, Italo Zoppis, Bonizzoni, P, Dondi, R, Mauri, G, and Zoppis, I
- Subjects
Discrete mathematics ,String comparison ,Multiset ,Settore INF/01 - Informatica ,General Computer Science ,Applied Mathematics ,APX-hardne ,Superstring theory ,Parameterized complexity ,INF/01 - INFORMATICA ,Algorithms ,APX-hardness ,Shortest common superstring ,Substring ,Computer Science Applications ,Combinatorics ,Algorithm ,High Energy Physics::Theory ,Polynomial kernel ,Bounded function ,Theory of computation ,Swap (computer programming) ,Mathematics - Abstract
In several areas, for example in bioinformatics and in AI planning, the Shortest Common Superstring problem (SCS) and variants thereof have been successfully applied for string comparison. In this paper we consider two variants of SCS recently introduced, namely Restricted Common Superstring (RCS) and Swapped Common Superstring (SRCS). In RCS we are given a set $$S$$S of strings and a multiset $$\mathcal {M}$$M of symbols, and we look for an ordering $$\mathcal {M}_o$$Mo of $$\mathcal {M}$$M such that the number of input strings which are substrings of $$\mathcal {M}_o$$Mo is maximized. In SRCS we are given a set $$S$$S of strings and a text $$\mathcal {T}$$T, and we look for a swap ordering $$\mathcal {T}_o$$To of $$\mathcal {T}$$T (an ordering of $$\mathcal {T}$$T obtained by swapping only some pairs of adjacent symbols) such that the number of input strings which are substrings of $$\mathcal {T}_o$$To is maximized. In this paper we propose a multivariate algorithmic analysis of the complexity of the two problems, aiming at determining how different parameters influence the complexity of the two problems. We consider as interesting parameters the size of the solutions (that is the number of input strings contained in the computed superstring), the maximum length of the given input strings, the size of the alphabet over which the input strings range. First, we give two fixed-parameter algorithms, where the parameter is the size of the solution, for SRCS and lRCS (the RCS problem restricted to strings of length bounded by a parameter $$\ell $$l). Furthermore, we complement these results by showing that SRCS and lRCS do not admit a polynomial kernel unless $$NP \subseteq coNP/Poly$$NP⊆coNP/Poly. Then, we show that SRCS is APX-hard even when the input strings have length bounded by a constant (equal to $$10$$10) or are over a binary alphabet.
- Published
- 2015
21. Greedy Conjecture for Strings of Length 4
- Author
-
Alexander S. Kulikov, Sergey Savinov, and Evgeniy Sluzhaev
- Subjects
Combinatorics ,Discrete mathematics ,High Energy Physics::Theory ,Conjecture ,TheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITY ,Shortest common superstring ,Greedy algorithm ,Travelling salesman problem ,MathematicsofComputing_DISCRETEMATHEMATICS ,Collatz conjecture ,Mathematics - Abstract
In this short note, we prove that the greedy conjecture for the shortest common superstring problem is true for strings of length 4.
- Published
- 2015
22. On the Greedy Superstring Conjecture
- Author
-
Maik Weinard and Georg Schnitger
- Subjects
Combinatorics ,Discrete mathematics ,High Energy Physics::Theory ,Conjecture ,Cycle cover ,General Mathematics ,Shortest common superstring ,Superstring theory ,Greedy algorithm ,Merge (version control) ,Upper and lower bounds ,Greedy randomized adaptive search procedure ,Mathematics - Abstract
We investigate the greedy algorithm for the shortest common superstring problem. We show that the length of the greedy superstring is upper-bounded by the sum of the lengths of an optimal superstring and an optimal cycle cover, provided the greedy algorithm happens to merge the strings in a particular way. Thus, when restricting inputs correspondingly, we verify the well-known greedy conjecture, namely, that the approximation ratio of the greedy algorithm is within a factor of two of the optimum, and actually extend the conjecture considerably. We achieve this bound by systematically combining known conditional inequalities about overlaps and period- and string-lengths with a new familiy of string inequalities. We show that conventional systems of conditional inequalities, including the Monge inequalities, are insufficient to obtain our result.
- Published
- 2006
23. On reoptimization of the shortest common superstring problem
- Author
-
Vladimir Popov
- Subjects
Combinatorics ,Applied Mathematics ,Shortest common superstring ,COMMON SUPERSTRING ,Hamming distance ,REOPTIMIZATION ,HAMMING DISTANCE ,Mathematics - Abstract
In general, a reoptimization gives us a possibility to obtain a solution for a larger instance from a solution for a smaller instance. In this paper, we consider a possibility of usage of a reoptimization to solve the shortest common superstring problem.
- Published
- 2013
24. Restricted and Swap Common Superstring: A Multivariate Algorithmic Perspective
- Author
-
Bonizzoni, P, Dondi, R, Mauri, G, Zoppis, I, Bonizzoni, P, Dondi, R, Mauri, G, and Zoppis, I
- Abstract
In several areas, for example in bioinformatics and in AI planning, the Shortest Common Superstring problem (SCS) and variants thereof have been successfully applied for string comparison. In this paper we consider two variants of SCS recently introduced, namely Restricted Common Superstring (RCS) and Swapped Common Superstring (SRCS). In RCS we are given a set S of strings and a multiset M of symbols, and we look for an ordering Mo of M such that the number of input strings which are substrings of Mo is maximized. In SRCS we are given a set S of strings and a text T, and we look for a swap ordering To of T (an ordering of T obtained by swapping only some pairs of adjacent symbols) such that the number of input strings which are substrings of To is maximized. In this paper we propose a multivariate algorithmic analysis of the complexity of the two problems, aiming at determining how different parameters influence the complexity of the two problems. We consider as interesting parameters the size of the solutions (that is the number of input strings contained in the computed superstring), the maximum length of the given input strings, the size of the alphabet over which the input strings range. First, we give two fixed-parameter algorithms, where the parameter is the size of the solution, for SRCS and lRCS (the RCS problem restricted to strings of length bounded by a parameter l). Furthermore, we complement these results by showing that SRCS and lRCS do not admit a polynomial kernel unless NP ⊆ coNP/Poly. Then, we show that SRCS is APX-hard even when the input strings have length bounded by a constant (equal to 10) or are over a binary alphabet.
- Published
- 2015
25. TOWARDS A DNA SOLUTION TO THE SHORTEST COMMON SUPERSTRING PROBLEM
- Author
-
Greg Gloor, Lila Kari, Michelle Gaasenbeek, and Sheng Yu
- Subjects
Coding system ,Artificial Intelligence ,Computer science ,DNA computing ,law ,Shortest common superstring ,A-DNA ,Algorithm ,law.invention - Abstract
This paper proposes a DNA algorithm for solving an NP-complete problem (The Shortest Common Superstring Problem) by manipulation of biomolecules, and presents partial results of the experiment that implements our algorithm. We also discuss practical constraints that have to be taken into account when implementing the algorithm, propose a coding system as a solution to these practical restrictions, and discuss the control experiments performed for establishing the parameters controlling the specificity of the assay.
- Published
- 1999
26. Shortest common superstrings and scheduling with coordinated starting times
- Author
-
Martin Middendorf
- Subjects
Discrete mathematics ,General Computer Science ,Job shop ,Superstring theory ,Of the form ,Job-shop ,Flow shop scheduling ,Scheduling (computing) ,NP-completeness ,Theoretical Computer Science ,Combinatorics ,Flow-shop ,Shortest common superstring ,Superstrings ,Mathematics ,Computer Science(all) - Abstract
In the first part of this paper we show that the Shortest Common Superstring problem is NP-complete even if all strings are of the simple form 10p10q, p, q gE oN. This result closes the gap left between the polynomial cases where all strings are of the form 0p10q or all strings are of the form 10p1 and NP-complete cases when strings have a more complicated structure. In the second part of the paper we use the above result to investigate the complexity of 2-machine flow-shop and open-shop problems with machines that have to coordinate their starting times, i.e. when one machine starts an operation the other machine also starts an operation or has to be idle at that time.
- Published
- 1998
- Full Text
- View/download PDF
27. An Evolutionary Approach to Hard Test Case Generation for Shortest Common Superstring Problem
- Author
-
Fedor Tsarev and Maxim Buzdalov
- Subjects
Mathematical optimization ,Test case ,Theoretical computer science ,Heuristic ,media_common.quotation_subject ,Shortest common superstring ,Evolutionary algorithm ,Superstring theory ,Quality (business) ,Data compression ,media_common ,Mathematics ,Test (assessment) - Abstract
The shortest common superstring problem has important applications in computational biology (e.g. genome assembly) and data compression. This problem is NP-hard, but several heuristic algorithms proved to be efficient for this problem. For example, for the algorithm known as GREEDY it was shown that, if the optimal superstring has the length of N, it produces an answer with length not exceeding 3.5N. However, in practice, no test cases were found where the length of the answer is greater than or equal to 2N. For hard test case generation for such algorithms the traditional approach assumes creating such tests by hand. In this paper, we propose an evolutionary algorithm based framework for hard test case generation. We examine two approaches: single-objective and multi-objective. We introduce new test case quality measures and show that, according to these measures, automatically generated tests are better than any known ones.
- Published
- 2013
28. Approximating Shortest Superstring Problem Using de Bruijn Graphs
- Author
-
Alexander Golovnev, Alexander S. Kulikov, and Ivan Mihajlin
- Subjects
Discrete mathematics ,De Bruijn sequence ,Superstring theory ,Approximation algorithm ,Hamiltonian path ,Longest path problem ,Combinatorics ,High Energy Physics::Theory ,symbols.namesake ,Shortest Path Faster Algorithm ,Simple (abstract algebra) ,Shortest common superstring ,symbols ,Mathematics - Abstract
The best known approximation ratio for the shortest superstring problem is \(2\frac{11}{23}\) (Mucha, 2012). In this note, we improve this bound for the case when the length of all input strings is equal to r, for r ≤ 7. E.g., for strings of length 3 we get a \(1\frac{1}{3}\)-approximation. An advantage of the algorithm is that it is extremely simple both to implement and to analyze. Another advantage is that it is based on de Bruijn graphs. Such graphs are widely used in genome assembly (one of the most important practical applications of the shortest common superstring problem). At the same time these graphs have only a few applications in theoretical investigations of the shortest superstring problem.
- Published
- 2013
29. DNA sequencing and string learning
- Author
-
Tao Jiang and Ming Li
- Subjects
Discrete mathematics ,Sequence ,business.industry ,General Mathematics ,String (computer science) ,Superstring theory ,Machine learning ,computer.software_genre ,DNA sequencing ,Substring ,Theoretical Computer Science ,Set (abstract data type) ,Computational Theory and Mathematics ,Shortest common superstring ,Theory of computation ,Artificial intelligence ,business ,computer ,Mathematics - Abstract
In laboratories the majority of large-scale DNA sequencing is done following theshotgun strategy, which is to sequence large amount of relatively short fragments randomly and then heuristically find a shortest common superstring of the fragments [26]. We study mathematical frameworks, under plausible assumptions, suitable for massive automated DNA sequencing and for analyzing DNA sequencing algorithms. We model the DNA sequencing problem as learning a string from its randomly drawn substrings. Under certain restrictions, this may be viewed as string learning in Valiant's distribution-free learning model and in this case we give an efficient learning algorithm and a quantitative bound on how many examples suffice. One major obstacle to our approach turns out to be a quite well-known open question on how to approximate a shortest common superstring of a set of strings, raised by a number of authors in the last 10 years [9], [29], [30]. We give the firstprovably good algorithm which approximates a shortest superstring of lengthn by a superstring of lengthO(n logn). The algorithm works equally well even in the presence of negative examples, i.e., when merging of some strings is prohibited.
- Published
- 1996
30. Restricted and Swap Common Superstring: A Parameterized View
- Author
-
Paola Bonizzoni, Giancarlo Mauri, Riccardo Dondi, Italo Zoppis, Bonizzoni, P, Dondi, R, Mauri, G, and Zoppis, I
- Subjects
Discrete mathematics ,Multiset ,Settore INF/01 - Informatica ,Superstring theory ,Parameterized complexity ,Shortest Common Superstring ,INF/01 - INFORMATICA ,Substring ,Combinatorics ,Bounded function ,Shortest common superstring ,Computer Science::Data Structures and Algorithms ,Swap (computer programming) ,Mathematics ,parameterized complexity - Abstract
In several areas, in particular in bioinformatics and in AI planning, Shortest Common Superstring problem (SCS) and variants thereof have been successfully applied. In this paper we consider two variants of SCS recently introduced (Restricted Common Superstring, $\ensuremath{\text{\textsc{RCS}}}$) and (Swapped Common Superstring, $\ensuremath{\text{\textsc{SWCS}}}$). In $\ensuremath{\text{\textsc{RCS}}}$ we are given a set S of strings and a multiset, and we look for an ordering $\mathcal{M}_o$ of $\mathcal{M}$ such that the number of input strings which are substrings of $\mathcal{M}_o$ is maximized. In $\ensuremath{\text{\textsc{SWCS}}}$ we are given a set S of strings and a text $\mathcal{T}$, and we look for a swap ordering $\mathcal{T}_o$ of $\mathcal{T}$ (an ordering of $\mathcal{T}$ obtained by swapping only some pairs of adjacent characters) such that the number of input strings which are substrings of $\mathcal{T}_o$ is maximized. In this paper we investigate the parameterized complexity of the two problems. We give two fixed-parameter algorithms, where the parameter is the size of the solution, for $\ensuremath{\text{\textsc{SWCS}}}$ and $\ensuremath{\text{\textsc{$\ell$-RCS}}} $ (the $\ensuremath{\text{\textsc{RCS}}}$ problem restricted to strings of length bounded by a parameter l). Furthermore, we complement these results by showing that $\ensuremath{\text{\textsc{SWCS}}}$ and $\ensuremath{\text{\textsc{$\ell$-RCS}}} $ do not admit a polynomial kernel.
- Published
- 2012
31. Reoptimization of the Shortest Common Superstring Problem
- Author
-
Richard Královič, Dennis Komm, Sebastian Seibert, Anna Zych, Tobias Mömke, Hans-Joachim Böckenhauer, and Davide Bilò
- Subjects
Optimization problem ,General Computer Science ,shortest common superstring ,Reoptimization ,Shortest Common Superstring ,Approximation algorithms ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Data processing, computer science ,Shortest common superstring ,reoptimization ,0202 electrical engineering, electronic engineering, information engineering ,approximation ,Mathematics ,Computer Sciences ,Applied Mathematics ,String (computer science) ,Approximation algorithm ,Computer Science Applications ,Datavetenskap (datalogi) ,010201 computation theory & mathematics ,Theory of computation ,020201 artificial intelligence & image processing ,ddc:004 ,Algorithm - Abstract
A reoptimization problem describes the following scenario: given an instance of an optimization problem together with an optimal solution for it, we want to find a good solution for a locally modified instance. In this paper, we deal with reoptimization variants of the shortest common superstring problem (SCS) where the local modifications consist of adding or removing a single string. We show the NP-hardness of these reoptimization problems and design several approximation algorithms for them. First, we use a technique of iteratively using any SCS algorithm to design an approximation algorithm for the reoptimization variant of adding a string whose approximation ratio is arbitrarily close to 8/5 and another algorithm for deleting a string with a ratio tending to 13/7. Both algorithms significantly improve over the best currently known SCS approximation ratio of 2.5. Additionally, this iteration technique can be used to design an improved SCS approximation algorithm (without reoptimization) if the input instance contains a long string, which might be of independent interest. However, these iterative algorithms are relatively slow. Thus, we present another, faster approximation algorithm for inserting a string which is based on cutting the given optimal solution and achieves an approximation ratio of 11/6. Moreover, we give some lower bounds on the approximation ratio which can be achieved by algorithms that use such cutting strategies., Algorithmica, 61 (2), ISSN:0178-4617, ISSN:1432-0541
- Published
- 2011
32. The shortest common superstring problem
- Author
-
Gorbenko, A., Popov, V., Gorbenko, A., and Popov, V.
- Abstract
We consider the problem of the shortest common superstring. We describe an approach to solve the problem. This approach is based on an explicit reduction from the problem to the satisfiability problem. © 2013 Anna Gorbenko and Vladimir Popov.
- Published
- 2013
33. On Shortest Common Superstring and Swap Permutations
- Author
-
Zvi Gotthilf, Moshe Lewenstein, and Alexandru Popa
- Subjects
Combinatorics ,Discrete mathematics ,Permutation ,Exact algorithm ,Shortest common superstring ,Superstring theory ,Pattern matching ,Swap (computer programming) ,Time complexity ,Substring ,Mathematics - Abstract
The Shortest Common Superstring (SCS) is a well studied problem, having a wide range of applications. In this paper we consider two problems closely related to it. First we define the Swapped Restricted Superstring(SRS) problem, where we are given a set S of n strings, s1, s2, . . . , sn, and a text T = t1t2 . . . tm, and our goal is to find a swap permutation π : {1, . . . ,m} → {1, . . . , m} to maximize the number of strings in S that are substrings of tπ(1)tπ(2) . . . tπ(m). We then show that the SRS problem is NP-Complete. Afterwards, we consider a similar variant denoted SRSR, where our goal is to find a swap permutation π : {1, . . . , m} → {1, . . . , m} to maximize the total number of times that the strings of S appear in tπ(1)tπ(2) . . . tπ(m) (we can count the same string si as a substring of tπ(1)tπ(2) . . . tπ(m) more than once). For this problem, we present a polynomial time exact algorithm.
- Published
- 2010
34. A note on shortest superstrings with flipping
- Author
-
Tao Jiang, Ming Li, and Ding-Zhu Du
- Subjects
Superstring theory ,Approximation algorithm ,Variation (game tree) ,Computer Science Applications ,Theoretical Computer Science ,Combinatorics ,Set (abstract data type) ,High Energy Physics::Theory ,Simple (abstract algebra) ,Shortest common superstring ,Signal Processing ,Greedy algorithm ,Information Systems ,Mathematics - Abstract
Approximation algorithms for the shortest common superstring problem have been studied recently. This paper considers an interesting variation of the problem: For a given set of strings S ={ s 1 ,…, s m }, find a shortest superstring that contains either s i or s R i for each i . The problem may have applications in DNA sequencing practice when orientations of the fragments in the target DNA molecule are unknown. We give a simple greedy algorithm and prove a 4 n approximation bound for it.
- Published
- 1992
35. Reoptimization of the Shortest Common Superstring Problem (Extended Abstract)
- Author
-
Sebastian Seibert, Davide Bilò, Dennis Komm, Tobias Mömke, Richard Královič, Anna Zych, and Hans-Joachim Böckenhauer
- Subjects
Algebra ,Optimization problem ,010201 computation theory & mathematics ,Shortest common superstring ,String (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Approximation algorithm ,020201 artificial intelligence & image processing ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Algorithm ,Mathematics - Abstract
A reoptimization problem describes the following scenario: Given an instance of an optimization problem together with an optimal solution for it, we want to find a good solution for a locally modified instance. In this paper, we deal with reoptimization variants of the shortest common superstring problem where the local modifications consist of adding or removing a single string. We show NP-hardness of these reoptimization problems and design several approximation algorithms for them.
- Published
- 2009
36. A Probabilistic PTAS for Shortest Common Superstring
- Author
-
Kai Plociennik and Publica
- Subjects
Discrete mathematics ,General Computer Science ,Probabilistic logic ,Approximation algorithm ,Theoretical Computer Science ,Combinatorics ,Exponential growth ,Mixing (mathematics) ,Shortest common superstring ,Compression (functional analysis) ,Scheme (mathematics) ,Greedy algorithm ,Constant (mathematics) ,Time complexity ,Mathematics - Abstract
We consider approximation algorithms for the shortest common superstring problem (SCS). It is well known that there is a constant f>1f>1 such that there is no efficient approximation algorithm for SCS achieving a factor of at most f in the worst case, unless P=NPP=NP. We study SCS on random inputs and present an approximation scheme that achieves, for every e>0e>0, a (1+e)(1+e)-approximation in expected polynomial time. We also show that the greedy algorithm achieves approximation ratio 1+e1+e with probability exponentially close to 1. These results apply not only if the letters are chosen independently at random, but also to the more realistic mixing model, which allows dependencies among the letters of the random strings. Our results are based on a sharp tail bound on the optimal compression, which improves a previous result by Frieze and Szpankowski (1998).
- Published
- 2009
37. Why Greed Works for Shortest Common Superstring Problem
- Author
-
Bin Ma
- Subjects
Discrete mathematics ,Smoothed analysis ,Wildcard character ,Random perturbation ,0102 computer and information sciences ,02 engineering and technology ,computer.file_format ,01 natural sciences ,Combinatorics ,Asymptotically optimal algorithm ,010201 computation theory & mathematics ,Shortest common superstring ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Greedy algorithm ,computer ,Greedy randomized adaptive search procedure ,Mathematics - Abstract
The shortest common superstring problem (SCS) has been widely studied for its applications in string compression and DNA sequence assembly. Although it is known to be Max-SNP hard, the simple greedy algorithm works extremely well in practice. Previous researchers have proved that the greedy algorithm is asymptotically optimal on random instances. Unfortunately, the practical instances in DNA sequence assembly are very different from random instances. In this paper we explain the good performance of greedy algorithm by using the smoothed analysis. We show that, for anygiven instance Iof SCS, the average approximation ratio of the greedy algorithm on a small random perturbation of Iis 1 + o(1). The perturbation defined in the paper is small and naturally represents the mutations of the DNA sequence during evolution. Due to the existence of the uncertain nucleotides in the output of a DNA sequencing machine, we also proposed the shortest common superstring with wildcards problem (SCSW). We prove that in worst case SCSW cannot be approximated within ratio n1/7 i¾? i¾?, while the greedy algorithm still has 1 + o(1) smoothed approximation ratio.
- Published
- 2008
38. Restricted and Swap Common Superstring: A Parameterized View
- Author
-
Bonizzoni, P, Dondi, R, Mauri, G, Zoppis, I, BONIZZONI, PAOLA, MAURI, GIANCARLO, ZOPPIS, ITALO FRANCESCO, Bonizzoni, P, Dondi, R, Mauri, G, Zoppis, I, BONIZZONI, PAOLA, MAURI, GIANCARLO, and ZOPPIS, ITALO FRANCESCO
- Abstract
In several areas, in particular in bioinformatics and in AI planning, Shortest Common Superstring problem (SCS) and variants thereof have been successfully applied. In this paper we consider two variants of SCS recently introduced (Restricted Common Superstring, RCS) and (Swapped Common Superstring, SWCS). In RCS we are given a set S of strings and a multiset, and we look for an ordering Mo of M such that the number of input strings which are substrings of Mo is maximized. In SWCS we are given a set S of strings and a text T , and we look for a swap ordering To of T (an ordering of T obtained by swapping only some pairs of adjacent characters) such that the number of input strings which are substrings of To is maximized. In this paper we investigate the parameterized complexity of the two problems. We give two fixed-parameter algorithms, where the parameter is the size of the solution, for SWCS and -RCS (the RCS problem restricted to strings of length bounded by a parameter ). Furthermore, we complement these results by showing that SWCS and -RCS do not admit a polynomial kernel
- Published
- 2012
39. Viral Genome Compression
- Author
-
Lucian Ilie, Liviu Tinta, Kathleen A. Hill, and Cristian Popescu
- Subjects
Combinatorics ,Shortest common superstring ,Compression (functional analysis) ,Process (computing) ,Superstring theory ,Space (mathematics) ,Greedy algorithm ,Quantitative Biology::Genomics ,Genome ,Upper and lower bounds ,Mathematics - Abstract
Viruses compress their genome to reduce space. One of the main techniques is overlapping genes. We model this process by the shortest common superstring problem, that is, we look for the shortest genome which still contains all genes. We give an algorithm for computing optimal solutions which is slow in the number of strings but fast (linear) in their total length. This algorithm is used for a number of viruses with relatively few genes. When the number of genes is larger, we compute approximate solutions using the greedy algorithm which gives an upper bound for the optimal solution. We give also a lower bound for the shortest common superstring problem. The results obtained are then compared with what happens in nature. Remarkably, the compression obtained by viruses is quite high and also very close to the one achieved by modern computers.
- Published
- 2006
40. A Genetic Algorithm for the Shortest Common Superstring Problem
- Author
-
Heidi J. Romero, Luis C. González, and Carlos A. Brizuela
- Subjects
Dinic's algorithm ,Computer science ,Population-based incremental learning ,Crossover ,Superstring theory ,Operator (computer programming) ,Shortest Path Faster Algorithm ,Sequencing by hybridization ,Algorithmics ,Shortest common superstring ,Genetic algorithm ,Suurballe's algorithm ,Difference-map algorithm ,Algorithm ,Mathematics ,FSA-Red Algorithm ,Data compression - Abstract
This paper presents a genetic algorithm for the shortest common superstring problem. The problem appears in the computational part of a deoxyribonucleic acid (DNA) sequencing procedure as well as in data compression. The proposed genetic algorithm is based on a recently proposed algorithm for the sequencing by hybridization (SBH) problem. The algorithm exploits the crossover operator and modifies the objective function to make the algorithm both more effective and more efficient. Experimental results on real data show the effectiveness of the proposed algorithm.
- Published
- 2004
41. On the Greedy Superstring Conjecture
- Author
-
Maik Weinard and Georg Schnitger
- Subjects
Combinatorics ,High Energy Physics::Theory ,Class (set theory) ,Conjecture ,Performance ratio ,Cycle cover ,TheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITY ,Shortest common superstring ,Superstring theory ,Greedy algorithm ,Upper and lower bounds ,MathematicsofComputing_DISCRETEMATHEMATICS ,Mathematics - Abstract
We investigate the greedy algorithm for the shortest common superstring problem. For a restricted class of orders in which strings are merged, we show that the length of the greedy superstring is upper-bounded by the sum of the length of an optimal superstring and an optimal cycle cover. Thus in this restricted setting we verify the well known conjecture, that the performance ratio of the greedy algorithm is within a factor of two of the optimum and actually extend the conjecture considerably.
- Published
- 2003
42. Towards a DNA solution to the shortest common superstring problem
- Author
-
R. Gaasenbeek, Sheng Yu, Gregory B. Gloor, and Lila Kari
- Subjects
DNA computing ,law ,Computer science ,Shortest common superstring ,Control system ,Simulated annealing ,Parallel algorithm ,A-DNA ,String searching algorithm ,Algorithm ,law.invention ,Coding (social sciences) - Abstract
This paper proposes a DNA algorithm for solving an NP-complete problem (the shortest common superstring problem) by manipulation of biomolecules, and presents partial results of the experiment that implements our algorithm. We also discuss practical constraints that have to be taken into account when implementing the algorithm, propose a coding system as a solution to these practical restrictions, and discuss the control experiments performed for establishing the parameters controlling the specificity of the assay.
- Published
- 2002
43. Lower Bounds for Approximating Shortest Superstrings over an Alphabet of Size 2
- Author
-
Sascha Ott
- Subjects
Combinatorics ,Reduction (complexity) ,Arbitrarily large ,Shortest common superstring ,Algorithm complexity ,Combinatorial optimization ,Superstring theory ,Approximation algorithm ,Alphabet ,Mathematics - Abstract
The shortest common superstring problem (SCS) is known to be NP-hard and APX-hard. The APX-hardness was proved for the SCS in [BJLTY94], but the reduction used in that paper produces instances with arbitrarily large alphabets. We show that the problem is APX-hard even if the size of the alphabet is 2. A lot of results concerning approximation algorithms have been published. We use our result to establish the first explicit inapproximability results for the SCS.
- Published
- 1999
44. Reconstructing sequences from shotgun data
- Author
-
Paul Cull and Jim L. Holloway
- Subjects
Sequence ,Shotgun sequencing ,Computer science ,law ,Shortest common superstring ,String (computer science) ,Suffix array ,String searching algorithm ,Suffix ,Algorithm ,Substring ,law.invention - Abstract
One method of sequencing DNA produces a linear list of bases, {A, C, G, T}, for many short overlapping fragments of the original DNA. To find the sequence of the original piece of DNA, the many fragments must be reassembled. While this problem of reassembly is similar to the NP-complete shortest common superstring problem [GMS80], we believe that biologists are actually trying to solve a simpler problem. Biologists assume that short overlaps between substrings are insignificant. Further, they assume that there is a unique string from which substrings could have been produced. We consider a reconstruction problem with these restrictions. We devise algorithms for this problem both when the overlaps must be exact and when there may be errors in the overlaps. Our algorithms are based on Rabin-Karp string matching, and on suffix arrays. We investigate the running times of our algorithms, and show that in expected case they have running times proportional to the length of the reconstructed sequence. We give the timings of some test runs and note that the suffix array algorithms seem to be faster.
- Published
- 1993
45. ERRATUM: 'CONDITIONAL INEQUALITIES AND THE SHORTEST COMMON SUPERSTRING PROBLEM'
- Author
-
Maik Weinard and Uli Laube
- Subjects
Discrete mathematics ,Inequality ,Shortest common superstring ,media_common.quotation_subject ,Computer Science (miscellaneous) ,Mathematics ,media_common - Published
- 2006
46. On finding minimal length superstrings
- Author
-
David Maier, John K. Gallant, and James Astorer
- Subjects
Discrete mathematics ,Physics::General Physics ,Computer Networks and Communications ,Applied Mathematics ,High Energy Physics::Phenomenology ,Superstring theory ,String (physics) ,Substring ,Theoretical Computer Science ,Combinatorics ,Set (abstract data type) ,High Energy Physics::Theory ,Integer ,Computational Theory and Mathematics ,Shortest common superstring ,Time complexity ,Mathematics ,Data compression - Abstract
A superstring of a set of strings {s1,…, sn} is a string s containing each si, 1 ⩽ i ⩽ n, as a substring. The superstring problem is: Given a set S of strings and a positive integer K, does S have a superstring of length K? The superstring problem has applications to data storage; specifically, data compression. We consider the complexity of the superstring problem. NP-completeness results dealing with sets of strings over both finite and infinite alphabets are presented. Also, for a restricted version of the superstring problem, a linear time algorithm is given.
- Published
- 1980
- Full Text
- View/download PDF
47. A greedy approximation algorithm for constructing shortest common superstrings
- Author
-
Jorma Tarhio and Esko Ukkonen
- Subjects
Discrete mathematics ,General Computer Science ,Superstring theory ,Approximation algorithm ,Theoretical Computer Science ,Combinatorics ,symbols.namesake ,Shortest Path Faster Algorithm ,Greedy approximation ,Shortest common superstring ,symbols ,Heuristics ,Hamiltonian (quantum mechanics) ,Algorithm ,Mathematics ,Computer Science(all) - Abstract
An approximation algorithm for the shortest common superstring problem is developed, based on the Knuth-Morris-Pratt string-matching procedure and on the greedy heuristics for finding longest Hamiltonian paths in weighted graphs. Given a set R of strings, the algorithm constructs a common superstring for R in O(mn) steps where m is the number of strings in R and n is the total length of these strings. The performance of the algorithm is analysed in terms of the compression in the common superstrings constructed, that is, in terms of n−k where k is the length of the obtained superstring. We show that (n−k)⩾12(n−kmin) where kmin is the length of a shortest common superstring. Hence the compression achieved by the algorithm is at least half of the maximum compression. It also seems that the lengths always satisfy k⩽2·kmin but proving this remains open.
- Full Text
- View/download PDF
48. On multiple occurrences shortest common superstring problem
- Author
-
Vladimir Popov and Anna Gorbenko
- Subjects
Combinatorics ,Reduction (complexity) ,Discrete mathematics ,NP-COMPLETE ,MULTIPLE OCCURRENCES SHORTEST COMMON SUPERSTRING PROBLEM ,Applied Mathematics ,Shortest common superstring ,SATISFIABILITY PROBLEM ,NP-complete ,Boolean satisfiability problem ,Mathematics - Abstract
In this paper, we consider multiple occurrences shortest common superstring problem. In particular, we consider an approach to solve the problem. This approach is based on an explicit reduction from the problem to the satisfiability problem.
49. The shortest common superstring problem
- Author
-
Anna Gorbenko and Vladimir Popov
- Subjects
Reduction (complexity) ,Combinatorics ,Discrete mathematics ,NP-COMPLETE ,Applied Mathematics ,Shortest common superstring ,SATISFIABILITY ,SHORTEST COMMON SUPERSTRING ,NP-complete ,Boolean satisfiability problem ,Satisfiability ,Mathematics - Abstract
We consider the problem of the shortest common superstring. We describe an approach to solve the problem. This approach is based on an explicit reduction from the problem to the satisfiability problem. © 2013 Anna Gorbenko and Vladimir Popov.
50. Shortest Common Superstrings of Random Strings
- Author
-
Alexander, Kenneth S.
- Published
- 1996
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.