Author: "Ayad L.A.K." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Ayad L.A.K."' showing total 9 results

Start Over Author "Ayad L.A.K."

9 results on '"Ayad L.A.K."'

1. Seedability: Optimizing alignment parameters for sensitive sequence comparison

Author: Ayad, L.A.K. (Lorraine), Chikhi, R. (Rayan), Pissis, S. (Solon), Ayad, L.A.K. (Lorraine), Chikhi, R. (Rayan), and Pissis, S. (Solon)
Abstract: Motivation: Most sequence alignment techniques make use of exact k-mer hits, called seeds, as anchors to optimize alignment speed. A large number of bioinformatics tools employing seed-based alignment techniques, such as Minimap2, use a single value of k per sequencing technology, without a strong guarantee that this is the best possible value. Given the ubiquity of sequence alignment, identifying values of k that lead to more sensitive alignments is thus an important task. To aid this, we present Seedability, a seed-based alignment framework designed for estimating an optimal seed k-mer length (as well as a minimal number of shared seeds) based on a given alignment identity threshold. In particular, we were motivated to make Minimap2 more sensitive in the pairwise alignment of short sequences. Results: The experimental results herein show improved alignments of short and divergent sequences when using the parameter values determined by Seedability in comparison to the default values of Minimap2. We also show several cases of pairs of real divergent sequences, where the default parameter values of Minimap2 yield no output alignments, but the values output by Seedability produce plausible alignments.
Published: 2023
Full Text: View/download PDF

2. Constructing antidictionaries of long texts in output-sensitive space

Author: Ayad, L.A.K. (Lorraine), Badkobeh, G. (Golnaz), Fici, G. (Gabriele), Heliou, A. (Alice), Pissis, S. (Solon), Ayad, L.A.K. (Lorraine), Badkobeh, G. (Golnaz), Fici, G. (Gabriele), Heliou, A. (Alice), and Pissis, S. (Solon)
Abstract: A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y1, … , yk over an alphabet Σ, we are asked to compute the set M{y1,…,yk}ℓ of minimal absent words of length at most ℓ of the collection {y1, … , yk}. The set M{y1,…,yk}ℓ contains all the words x such that x is absent from all the words of the collection while there exist i,j, such that the maximal proper suffix of x is a factor of yi and the maximal proper prefix of x is a factor of yj. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. Indeed, the set Myℓ of minimal absent words of a word y is equal to M{y1,…,yk}ℓ for any decomposition of y into a collection of words y1, … , yk such that there is an overlap of length at least ℓ − 1 between any two consecutive words in the collection. This computation generally requires Ω(n) space for n = |y| using any of the plenty available O(n) -time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ∥M{y1,…,yN}ℓ∥=o(n), for all N ∈ [1,k], where ∥S∥ denotes the sum of the lengths of words in set S. For instance, in the human genome, n ≈ 3 × 109 but ∥M{y1,…,yk}12∥≈106. We consider a constant-sized alphabet for stating our results. We show that allMy1ℓ,…,M{y1,…,yk}ℓ can be computed in O(kn+∑N=1k∥M{y1,…,yN}ℓ∥) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in {y1, … , yk} and MaxOut=max{∥M{y1,…,yN}ℓ∥:N∈[1,k]}. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution.
Published: 2021
Full Text: View/download PDF

3. Longest property-preserved common factor: A new string-processing framework

Author: Ayad, L.A.K. (Lorraine), Bernardini, G. (Giulia), Grossi, R. (Roberto), Iliopoulos, C.S. (Costas), Pisanti, N. (Nadia), Pissis, S. (Solon), Rosone, G. (Giovanna), Ayad, L.A.K. (Lorraine), Bernardini, G. (Giulia), Grossi, R. (Roberto), Iliopoulos, C.S. (Costas), Pisanti, N. (Nadia), Pissis, S. (Solon), and Rosone, G. (Giovanna)
Abstract: We introduce a new family of string processing problems. Given two or more strings, we are asked to compute a factor common to all strings that preserves a specific property and has maximal length. We consider three fundamental string properties: square-free factors, periodic factors, and palindromic factors under three different settings, one per property. In the first setting, we are given a string x and we are asked to construct a data structure over x answering the following type of online queries: given a string y, find a longest square-free factor common to x and y. In the second setting, we are given k strings and an integer 1
Published: 2020
Full Text: View/download PDF

4. SMART: SuperMaximal approximate repeats tool

Author: Ayad, L.A.K. (Lorraine), Charalampopoulos, P. (Panagiotis), Pissis, S. (Solon), Ayad, L.A.K. (Lorraine), Charalampopoulos, P. (Panagiotis), and Pissis, S. (Solon)
Abstract: State-of-The-Art repeat analysis tools rely on extending maximal repeated pairs to enumerate maximal k-mismatch repeats. These pairs can be quadratic in n, the length of the input sequence, and thus greedy heuristics are applied to speed up the extension. Here, we introduce supermaximal k-mismatch repeats, which are linear in n and capture all maximal k-mismatch repeats: every maximal k-mismatch repeat is a substring of some supermaximal k-mismatch repeat. We present SMART, a tool based on recent algorithmic advances implemented in C++ to compute supermaximal k-mismatch repeats directly, and show that these elements are statistically much more significant than the output of t
Published: 2020
Full Text: View/download PDF

5. Comparing Degenerate Strings

Author: Alzamel, M. (Mai), Ayad, L.A.K. (Lorraine), Bernardini, G. (Giulia), Grossi, R. (Roberto), Iliopoulos, C.S. (Costas), Pisanti, N. (Nadia), Pissis, S. (Solon), Rosone, G. (Giovanna), Alzamel, M. (Mai), Ayad, L.A.K. (Lorraine), Bernardini, G. (Giulia), Grossi, R. (Roberto), Iliopoulos, C.S. (Costas), Pisanti, N. (Nadia), Pissis, S. (Solon), and Rosone, G. (Giovanna)
Abstract: Uncertain sequences are compact representations of sets of similar strings. They highlight common segments by collapsing them, and explicitly represent varying segments by listing all possible options. A generalized degenerate string (GD string) is a type of uncertain sequence. Formally, a GD string S is a sequence of n sets of strings of total size N, where the ith set contains strings of the same length ki but this length can vary between different sets. We denote by W the sum of these lengths k0, k1,... , kn-1. Our main result is an (N + M)-time algorithm for deciding whether two GD strings of total sizes N and M, respectively, over an integer alphabet, have a non-empty intersection. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in linear space. We then apply our string comparison tool to devise a simple algorithm for computing all palindromes in S in (min{W, n2}N)-time. We complement this upper bound by showing a similar conditional lower bound for computing maximal palindromes in S. We also show that a result, which is essentially the same as our string comparison linear-time algorithm, can be obtained by employing an automata-based approach.
Published: 2020
Full Text: View/download PDF

6. IsoXpressor: A Tool to Assess Transcriptional Activity within Isochores

Author: Ayad, L.A.K. (Lorraine), Dourou, A.-M. (Athanasia-Maria), Arhondakis, S. (Stilianos), Pissis, S. (Solon), Ayad, L.A.K. (Lorraine), Dourou, A.-M. (Athanasia-Maria), Arhondakis, S. (Stilianos), and Pissis, S. (Solon)
Abstract: Genomes are characterized by large regions of homogeneous base compositions known as isochores. The latter are divided into GC-poor and GC-rich classes linked to distinct functional and structural properties. Several studies have addressed how isochores shape function and structure. To aid in this important subject, we present IsoXpressor, a tool designed for the analysis of the functional property of transcription within isochores. IsoXpressor allows users to process RNA-Seq data in relation to the isochores, and it can be employed to investigate any biological question of interest for any species. The results presented herein as proof of concept are focused on the preimplantation process in Homo sapiens (human) and Macaca mulatta (rhesus monkey).
Published: 2020
Full Text: View/download PDF

7. Constructing antidictionaries in output-sensitive space

Author: Ayad, L.A.K. (Lorraine), Badkobeh, G. (Golnaz), Fici, G. (Gabriele), Heliou, A. (Alice), Pissis, S. (Solon), Ayad, L.A.K. (Lorraine), Badkobeh, G. (Golnaz), Fici, G. (Gabriele), Heliou, A. (Alice), and Pissis, S. (Solon)
Abstract: A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y1, y2,...,yk over an alphabet Σ, we are asked to compute the set M(y 1#...# y k ) ℓ of minimal absent words of length at most ℓ of word y=y1#y2#...#yk, #∉Σ. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. This computation generally requires Ω(n) space for n=|y| using any of the plenty available O(n)-time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ||M(y 1#...# y N ) ℓ || =o(n), for all N ϵ[1, k]. For instance, in the human genome, n ≈ 3 × 10 9 but ||M (y 1#...#yk ) 12 || ≈ 10 6. We consider a constant-sized alphabet for stating our results. We show that all M(y 1 ) ℓ ,...,M(y 1#...# y k ) ℓ can be computed in O(kn+Σ N=1 k ||M(y 1 #...#(y N ) ℓ ||) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in y 1 ,...,y k and MaxOut=max{||M (y 1)#...# (y N ) ℓ ||:N ϵ[1, k]. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution.
Published: 2019
Full Text: View/download PDF

8. Constructing Antidictionaries of Long Texts in Output-Sensitive Space

Author: Solon P. Pissis, Golnaz Badkobeh, Alice Héliou, Gabriele Fici, Lorraine A.K. Ayad, Department of Informatics [King's College London], King‘s College London, Goldsmiths, University of London (Goldsmiths College), University of London [London], Dipartimento di Matematica e Informatica [Palermo], Università degli studi di Palermo - University of Palermo, Centrum Wiskunde & Informatica (CWI), Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Bioinformatics, AIMMS, Bio Informatics (IBIVU), Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands, Ayad L.A.K., Badkobeh G., Fici G., Heliou A., and Pissis S.P.
Subjects: 0301 basic medicine, Antidictionary, Settore INF/01 - Informatica, Output sensitive algorithm, 0102 computer and information sciences, Space (mathematics), 01 natural sciences, Theoretical Computer Science, String algorithm, Prefix, Set (abstract data type), Combinatorics, 03 medical and health sciences, 030104 developmental biology, Computational Theory and Mathematics, 010201 computation theory & mathematics, Data compression, Output-sensitive algorithm, [INFO]Computer Science [cs], Suffix, Alphabet, Absent word, Word (group theory), Mathematics
Abstract: A wordxthat is absent from a wordyis calledminimalif all its proper factors occur iny. Given a collection ofkwordsy1, … ,ykover an alphabetΣ, we are asked to compute the set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓof minimal absent words of length at mostℓof the collection {y1, … ,yk}. The set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓcontains all the wordsxsuch thatxis absent from all the words of the collection while there existi,j, such that the maximal proper suffix ofxis a factor ofyiand the maximal proper prefix ofxis a factor ofyj. In data compression, this corresponds to computing the antidictionary ofkdocuments. In bioinformatics, it corresponds to computing words that are absent from a genome ofkchromosomes. Indeed, the set$\mathrm {M}^{\ell }_{y}$Myℓof minimal absent words of a wordyis equal to$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓfor any decomposition ofyinto a collection of wordsy1, … ,yksuch that there is an overlap of length at leastℓ− 1 between any two consecutive words in the collection. This computation generally requiresΩ(n) space forn= |y| using any of the plenty available$\mathcal {O}(n)$O(n)-time algorithms. This is because anΩ(n)-sized text index is constructed overywhich can be impractical for largen. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when$\| \mathrm {M}^{\ell }_{\{y_1,\ldots ,y_N\}}\| =o(n)$∥M{y1,…,yN}ℓ∥=o(n), for allN∈ [1,k], where ∥S∥ denotes the sum of the lengths of words in setS. For instance, in the human genome,n≈ 3 × 109but$\| \mathrm {M}^{12}_{\{y_1,\ldots ,y_k\}}\| \approx 10^{6}$∥M{y1,…,yk}12∥≈106. We consider a constant-sized alphabet for stating our results. We show thatall$\mathrm {M}^{\ell }_{y_{1}},\ldots ,\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$My1ℓ,…,M{y1,…,yk}ℓcan be computed in$\mathcal {O}(kn+{\sum }^{k}_{N=1}\| \mathrm {M}^{\ell }_{\{y_1,\ldots ,y_N\}}\| )$O(kn+∑N=1k∥M{y1,…,yN}ℓ∥)total time using$\mathcal {O}(\textsc {MaxIn}+\textsc {MaxOut})$O(MaxIn+MaxOut)space, where MaxIn is the length of the longest word in {y1, … ,yk} and$\textsc {MaxOut}=\max \limits \{\| \mathrm {M}^{\ell }_{\{y_1,\ldots ,y_N\}}\| :N\in [1,k]\}$MaxOut=max{∥M{y1,…,yN}ℓ∥:N∈[1,k]}. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution.
Published: 2021
Full Text: View/download PDF

9. Constructing Antidictionaries in Output-Sensitive Space

Author: Golnaz Badkobeh, Alice Héliou, Gabriele Fici, Solon P. Pissis, Lorraine A.K. Ayad, Department of Informatics [King's College London], King‘s College London, Goldsmiths, University of London (Goldsmiths College), University of London [London], Dipartimento di Matematica e Informatica [Palermo], Università degli studi di Palermo - University of Palermo, Centrum Wiskunde & Informatica (CWI), Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Department of Computing, Goldsmiths, University of London, Dipartimento di Matematica e Informatica, Università degli Studi di Palermo, Palermo, Italy, Storer, James A., Bilgin, Ali, Serra-Sagrista, Joan, Marcellin, Michael W., Ayad L.A.K., Badkobeh G., Fici G., Heliou A., Pissis S.P., and Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Subjects: FOS: Computer and information sciences, Settore ING-INF/05 - Sistemi Di Elaborazione Delle Informazioni, Output sensitive algorithms, String algorithms, Physics, Antidictionarie, Settore INF/01 - Informatica, Output sensitive algorithm, 0102 computer and information sciences, Absent words, Space (mathematics), 01 natural sciences, Antidictionaries, Combinatorics, 010201 computation theory & mathematics, TheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITY, Data compression, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS), Computer Science::Symbolic Computation, [INFO]Computer Science [cs], Absent word, Alphabet, Word (group theory)
Abstract: A word $x$ that is absent from a word $y$ is called minimal if all its proper factors occur in $y$. Given a collection of $k$ words $y_1,y_2,\ldots,y_k$ over an alphabet $\Sigma$, we are asked to compute the set $\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{k}}$ of minimal absent words of length at most $\ell$ of word $y=y_1\#y_2\#\ldots\#y_k$, $\#\notin\Sigma$. In data compression, this corresponds to computing the antidictionary of $k$ documents. In bioinformatics, it corresponds to computing words that are absent from a genome of $k$ chromosomes. This computation generally requires $\Omega(n)$ space for $n=|y|$ using any of the plenty available $\mathcal{O}(n)$-time algorithms. This is because an $\Omega(n)$-sized text index is constructed over $y$ which can be impractical for large $n$. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when $||\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{N}}||=o(n)$, for all $N\in[1,k]$. For instance, in the human genome, $n \approx 3\times 10^9$ but $||\mathrm{M}^{12}_{y_{1}\#\ldots\#y_{k}}|| \approx 10^6$. We consider a constant-sized alphabet for stating our results. We show that all $\mathrm{M}^{\ell}_{y_{1}},\ldots,\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{k}}$ can be computed in $\mathcal{O}(kn+\sum^{k}_{N=1}||\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{N}}||)$ total time using $\mathcal{O}(\mathrm{MaxIn}+\mathrm{MaxOut})$ space, where $\mathrm{MaxIn}$ is the length of the longest word in $\{y_1,\ldots,y_{k}\}$ and $\mathrm{MaxOut}=\max\{||\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{N}}||:N\in[1,k]\}$. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution., Comment: Version accepted to DCC 2019
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"Ayad L.A.K."'

1. Seedability: Optimizing alignment parameters for sensitive sequence comparison

2. Constructing antidictionaries of long texts in output-sensitive space

3. Longest property-preserved common factor: A new string-processing framework

4. SMART: SuperMaximal approximate repeats tool

5. Comparing Degenerate Strings

6. IsoXpressor: A Tool to Assess Transcriptional Activity within Isochores

7. Constructing antidictionaries in output-sensitive space

8. Constructing Antidictionaries of Long Texts in Output-Sensitive Space

9. Constructing Antidictionaries in Output-Sensitive Space

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

9 results on '"Ayad L.A.K."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources