11 results on '"Farnoud Hassanzadeh, Farzad"'
Search Results
2. Duplication Distance to the Root for Binary Sequences.
- Author
-
Alon, Noga, Bruck, Jehoshua, Farnoud Hassanzadeh, Farzad, and Jain, Siddharth
- Subjects
CHROMOSOME replication ,NUCLEOTIDE sequencing ,BINARY sequences ,L systems ,ENTROPY - Abstract
We study the tandem duplication distance between binary sequences and their roots. In other words, the quantity of interest is the number of tandem duplication operations of the form x = {a} {b} {c} \to {y} = {a} {b} {b} {c} , where x and y are sequences and a , b , and c are their substrings, needed to generate a binary sequence of length $n$ starting from a square-free sequence from the set {0, 1, 01, 10, 010, 101}. This problem is a restricted case of finding the duplication/deduplication distance between two sequences, defined as the minimum number of duplication and deduplication operations required to transform one sequence to the other. We consider both exact and approximate tandem duplications. For exact duplication, denoting the maximum distance to the root of a sequence of length $n$ by $f(n)$ , we prove that $f(n)=\Theta (n)$ . For the case of approximate duplication, where a $\beta $ -fraction of symbols may be duplicated incorrectly, we show that the maximum distance has a sharp transition from linear in $n$ to logarithmic at $\beta =1/2$ . We also study the duplication distance to the root for the set of sequences arising from a given root and for special classes of sequences, namely, the De Bruijn sequences, the Thue–Morse sequence, and the Fibonacci words. The problem is motivated by genomic tandem duplication mutations and the smallest number of tandem duplication events required to generate a given biological sequence. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
3. Capacity and Expressiveness of Genomic Tandem Duplication.
- Author
-
Jain, Siddharth, Farnoud Hassanzadeh, Farzad, and Bruck, Jehoshua
- Subjects
- *
HUMAN genome , *TANDEM repeats , *REPEATED sequence (Genetics) , *NUCLEOTIDE sequence , *GENETIC code , *MATHEMATICAL models - Abstract
The majority of the human genome consists of repeated sequences. An important type of repeated sequences common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTC\underline TGTGC , $TGTG$ is a tandem repeat, that may be generated from $AGTCTGC$ by a tandem duplication of length 2. In this paper, we investigate the possibility of generating a large number of sequences from a seed, i.e. a small initial string, by tandem duplications of bounded length. We study the capacity of such a system, a notion that quantifies the system’s generating power. Our results include exact capacity values for certain tandem duplication string systems. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the expressiveness of a tandem duplication system as the capability of expressing arbitrary substrings. We then completely characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. In particular, based on a celebrated result by Axel Thue from 1906, presenting a construction for ternary squarefree sequences, we show that for alphabets of size 4 or larger, bounded tandem duplication systems, regardless of the seed and the bound on duplication length, are not fully expressive, i.e. they cannot generate all strings even as substrings of other strings. Note that the alphabet of size 4 is of particular interest as it pertains to the genomic alphabet. Building on this result, we also show that these systems do not have full capacity. In general, our results illustrate that duplication lengths play a more significant role than the seed in generating a large number of sequences for these systems. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
4. Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms.
- Author
-
Jain, Siddharth, Farnoud Hassanzadeh, Farzad, Schwartz, Moshe, and Bruck, Jehoshua
- Subjects
- *
INFORMATION retrieval , *DNA , *CODING theory , *DIGITAL watermarking , *DATA integrity , *ERROR-correcting codes - Abstract
The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically modified organisms. Data stored in this medium are subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original. In particular, we present two families of codes for correcting errors due to tandem duplications of a fixed length: the first family can correct any number of errors, while the second corrects a bounded number of errors. We also study codes for correcting tandem duplications of length up to a given constant $k$ , where we are primarily focused on the cases of $k=2,3$ . Finally, we provide a full classification of the sets of lengths allowed in tandem duplication that result in a unique root for all sequences. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
5. The Capacity of String-Duplication Systems.
- Author
-
Farnoud Hassanzadeh, Farzad, Schwartz, Moshe, and Bruck, Jehoshua
- Subjects
- *
NUCLEOTIDE sequencing , *FORMAL languages , *NUCLEOTIDE sequence , *CONSTRAINED optimization , *FORMALIZATION (Linguistics) - Abstract
It is known that the majority of the human genome consists of duplicated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from duplicated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence using simple duplication rules, including those resembling genomic-duplication processes. In other words, our goal is to find the capacity, or the expressive power, of these string-duplication systems. Our results include exact capacities, and bounds on the capacities, of four fundamental string-duplication systems. The study of these fundamental biologically inspired systems is an important step toward modeling and analyzing more complex biological processes. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
6. Bounds for Permutation Rate-Distortion.
- Author
-
Farnoud Hassanzadeh, Farzad, Schwartz, Moshe, and Bruck, Jehoshua
- Subjects
- *
PERMUTATIONS , *CODING theory , *RATE distortion theory , *CHEBYSHEV approximation , *RANK correlation (Statistics) - Abstract
We study the rate-distortion relationship in the set of permutations endowed with the Kendall $\tau $ -metric and the Chebyshev metric (the $\ell _\infty $ -metric). This paper is motivated by the application of permutation rate-distortion to the average-case and worst-case distortion analysis of algorithms for ranking with incomplete information and approximate sorting algorithms. For the Kendall $\tau $ -metric, we provide bounds for various distortion regimes, while for the Chebyshev metric, we present bounds that are valid for all distortions and are especially accurate for small distortions. In addition, for the Chebyshev metric, we provide a construction for covering codes. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
7. An Axiomatic Approach to Constructing Distances for Rank Comparison and Aggregation.
- Author
-
Farnoud Hassanzadeh, Farzad and Milenkovic, Olgica
- Subjects
- *
PROBABILITY theory , *MARKOV processes , *INFORMATION theory , *INFORMATION retrieval , *AGGREGATION (Statistics) - Abstract
We propose a new family of distance measures on rankings, derived through an axiomatic approach, that consider the nonuniform relevance of the top and bottom of ordered lists and similarities between candidates. The proposed distance functions include specialized weighted versions of the Kendall \(\tau \) distance and the Cayley distance, and are suitable for comparing rankings in a number of applications, including information retrieval and rank aggregation. In addition to proposing the distance measures and providing the theoretical underpinnings for their applications, we also analyze algorithmic and computational aspects of weighted distance-based rank aggregation. We present an aggregation method based on approximating weighted distance measures by a generalized version of Spearman’s footrule distance as well as a Markov chain method inspired by PageRank, where transition probabilities of the Markov chain reflect the chosen weighted distances. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
8. Multipermutation Codes in the Ulam Metric for Nonvolatile Memories.
- Author
-
Farnoud Hassanzadeh, Farzad and Milenkovic, Olgica
- Subjects
CODING theory ,NONVOLATILE memory ,COMPUTER storage devices ,FLASH memory ,SIGNAL processing - Abstract
We address the problem of multipermutation code design in the Ulam metric for novel storage applications. Multipermutation codes are suitable for flash memory where cell charges may share the same rank. Changes in the charges of cells manifest themselves as errors whose effects on the retrieved signal may be measured via the Ulam distance. As part of our analysis, we study multipermutation codes in the Hamming metric, known as constant composition codes. We then present bounds on the size of multipermutation codes and their capacity, for both the Ulam and the Hamming metrics. Finally, we present constructions and accompanying decoders for multipermutation codes in the Ulam metric. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
9. Error-Correction in Flash Memories via Codes in the Ulam Metric.
- Author
-
Farnoud (Hassanzadeh), Farzad, Skachek, Vitaly, and Milenkovic, Olgica
- Subjects
- *
ERROR-correcting codes , *FLASH memory , *CODING theory , *GEOMETRICAL constructions , *PERMUTATIONS , *HAMMING codes - Abstract
We consider rank modulation codes for flash memories that allow for handling arbitrary charge-drop errors. Unlike classical rank modulation codes used for correcting errors that manifest themselves as swaps of two adjacently ranked elements, the proposed translocation rank codes account for more general forms of errors that arise in storage systems. Translocations represent a natural extension of the notion of adjacent transpositions and as such may be analyzed using related concepts in combinatorics and rank modulation coding. Our results include derivation of the asymptotic capacity of translocation rank codes, construction techniques for asymptotically good codes, as well as simple decoding methods for one class of constructed codes. As part of our exposition, we also highlight the close connections between the new code family and permutations with short common subsequences, deletion and insertion error-correcting codes for permutations, and permutation codes in the Hamming distance. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
10. Sorting of Permutations by Cost-Constrained Transpositions.
- Author
-
Farnoud (Hassanzadeh), Farzad and Milenkovic, Olgica
- Subjects
- *
SORTING (Electronic computers) , *MATHEMATICAL combinations , *CONSTRAINT satisfaction , *HEURISTIC algorithms , *BIOINFORMATICS , *MATHEMATICAL decomposition , *POLYNOMIALS - Abstract
The problem of finding a minimum decomposition of a permutation in terms of transpositions with predetermined non-uniform and non-negative costs is addressed. Alternatively, computing the transposition distance between two permutations, where transpositions are endowed with arbitrary non-negative costs, is studied. For such cost functions, polynomial-time, constant-approximation decomposition algorithms are described. For metric-path costs, exact polynomial-time decomposition algorithms are presented. The algorithms in this paper represent a combination of Viterbi-type algorithms and graph-search techniques for minimizing the cost of individual transpositions, and dynamic programing algorithms for finding minimum cost decompositions of cycles. The presented algorithms have a myriad of applications in information theory, bioinformatics, and algebra. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
11. On the Multimessage Capacity Region for Undirected Ring Networks.
- Author
-
Tabatabaei Yazdi, S. M. Sadegh, Savari, Serap A., Kramer, Gerhard, Carlson (Talaska), Kelli, and Farnoud (Hassanzadeh), Farzad
- Subjects
RING networks ,NETWORK routers ,LOCAL area networks ,CODING theory ,DATA compression ,INFORMATION theory - Abstract
The "Japanese" theorem is extended to multiple multicast sessions in an arbitrary network to characterize the routing capacity region by the intersection of an infinite collection of half-spaces. An elimination technique is developed to simplify this infinite description into a finite one based upon the shortest routing paths and trees in the network graph. This result is used as a step in providing the capacity regions for two multimessage multicast problems on undirected ring networks; in the first case only unicast and broadcast sessions are considered, and in the second case multicast sessions where the source and destination vertices form lines of adjacent vertices are studied. Network coding is generally necessary to achieve network capacity, but for our multimessage multicast problems, new arguments are used to demonstrate that routing can achieve network coding bounds. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.