6 results on '"Lipták, Zsuzsanna"'
Search Results
2. Suffix Sorting via Matching Statistics
- Author
-
Lipták, Zsuzsanna, Masillo, Francesco, Puglisi, Simon J., Boucher, Christina, Rahmann, Sven, Department of Computer Science, Bioinformatics, Algorithmic Bioinformatics, and Helsinki Institute for Information Technology
- Subjects
FOS: Computer and information sciences ,compressed representation ,data structures ,Theory of computation → Design and analysis of algorithms ,Computer Science - Data Structures and Algorithms ,Generalized suffix array ,string collections ,efficient algorithms ,Data Structures and Algorithms (cs.DS) ,Generalized suffix array, matching statistics, string collections, compressed representation, data structures, efficient algorithms ,matching statistics ,113 Computer and information sciences - Abstract
We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest., Comment: 16 pages, 4 figures; accepted at WABI 2022 (Workshop on Algorithms in Bioinformatics, Sept. 5-9, 2022, Potsdam, Germany)
- Published
- 2022
- Full Text
- View/download PDF
3. Decomposing Metabolomic Isotope Patterns.
- Author
-
Bücher, Philipp, Moret, Bernard M. E., Böcker, Sebastian, Letzel, Matthias C., Lipták, Zsuzsanna, and Pervukhin, Anton
- Abstract
We present a method for determining the sum formula of metabolites solely from their mass and isotope pattern. Metabolites, such as sugars or lipids, participate in almost all cellular processes, but the majority still remains uncharacterized. Our input is a measured isotope pattern from a high resolution mass spectrometer, and we want to find those molecules that best match this pattern. Determination of the sum formula is a crucial step in the identification of an unknown metabolite, as it reduces its possible structures to a hopefully manageable set. Our method is computationally efficient, and first results on experimental data indicate good identification rates for chemical compounds up to 700 Dalton. Above 1000 Dalton, the number of molecules with a certain mass increases rapidly. To efficiently analyze mass spectra of such molecules, we define several additive invariants extracted from the input and then propose to solve a joint decomposition problem. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
4. The Money Changing Problem Revisited: Computing the Frobenius Number in Time O(ka1).
- Author
-
Wang, Lusheng, Böcker, Sebastian, and Lipták, Zsuzsanna
- Abstract
The Money Changing Problem (also known as Equality Constrained Integer Knapsack Problem) is as follows: Let a1 < a2 < ... < ak be fixed positive integers with $\gcd(a_1, \dots, a_k) = 1$. Given some integer n, are there non-negative integers x1..., xk such that ∑iaixi = n? The Frobenius numberg(a1..., ak) is the largest integer n that has no decomposition of the above form. There exist algorithms that, for fixed k, compute the Frobenius number in time polynomial in log ak. For variable k, one can compute a residue table of a1 words which, in turn, allows to determine the Frobenius number. The best known algorithm for computing the residue table has runtime O(ka1 log a1) using binary heaps, and O(a1 (k+log a1)) using Fibonacci heaps. In both cases, O(a1) extra memory in addition to the residue table is needed. Here, we present an intriguingly simple algorithm to compute the residue table in time O(ka1) and extra memory O(1). In addition to computing the Frobenius number, we can use the residue table to solve the given instance of the Money Changing Problem in constant time, for any n. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
5. ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS.
- Author
-
BURCSI, PÉTER, CICALESE, FERDINANDO, FICI, GABRIELE, LIPTÁK, ZSUZSANNA, and Holub, J.
- Subjects
ALGORITHMS ,DATA structures ,MATHEMATICAL analysis ,QUERY (Information retrieval system) ,BINARY number system ,MATCHING theory - Abstract
The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a
1 , ..., aσ } is defined as the vector of multiplicities of the characters, p(s) = (p1 , ..., pσ ), where pi = |{j | sj = ai }|. Parikh vector q occurs in s if s has a substring t with p(t) = q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time. The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2 ) time. The second algorithm finds all occurrences of a given Parikh vector in a text over an arbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More precisely, we present two variants of the algorithm, both using an O(n) size data structure, each of which can be constructed in O(n) time. The first solution is very simple and easy to implement and leads to an expected query time of $O(n{(\frac{\sigma }{{\log \,\sigma }})^{1/2}}\frac{{\log \,m}}{{\sqrt m }})$, where m = ∑i qi is the length of a string with Parikh vector q. The second uses wavelet trees and improves the expected runtime to $O(n{(\frac{\sigma }{{\log \,\sigma }})^{1/2}}\frac{1}{{\sqrt m }})$, i.e., by a factor of log m. [ABSTRACT FROM AUTHOR]- Published
- 2012
- Full Text
- View/download PDF
6. r-indexing the eBWT.
- Author
-
Boucher, Christina, Cenzato, Davide, Lipták, Zsuzsanna, Rossi, Massimiliano, and Sciortino, Marinella
- Subjects
- *
DATA structures , *BACTERIAL genomes - Abstract
The extended Burrows-Wheeler Transform (eBWT) [Mantaci et al. TCS 2007] is a variant of the BWT, introduced for collections of strings. In this paper, we present the extended r-index , an analogous data structure to the r -index [Gagie et al. JACM 2020]. It occupies O (r) words, with r the number of runs of the eBWT, and offers the same functionalities as the r -index. We also show how to efficiently support finding maximal exact matches (MEMs). We implemented the extended r -index and tested it on circular bacterial genomes and plasmids, comparing it to five state-of-the-art compressed text indexes. While our data structure maintains similar time and memory requirements for answering pattern matching queries as the original r -index, it is the only index in the literature that can naturally be used for both circular and linear input collections. This is an extended version of [Boucher et al., r-indexing the eBWT, SPIRE 2021]. • Introducing the extended r -index (r -index for the eBWT). • Full cyclic pattern matching functionality in O (r) space. • Efficient construction of the extended r -index. • Computation of MEMs with the extended r -index. • Experimental comparison with several compressed indexes on real biological data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.