1. Joint String Complexity for Markov Sources: Small Data Matters *
- Author
-
Wojciech Szpankowski, Dimitris Milioris, Philippe Jacquet, inTeRnet BEyond the usual (TRiBE ), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Nokia Bell Labs [Nozay], and Purdue University [West Lafayette]
- Subjects
FOS: Computer and information sciences ,String complexity ,General Computer Science ,Computer science ,Computer Science - Information Theory ,saddle point methods ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Theoretical Computer Science ,Set (abstract data type) ,High Energy Physics::Theory ,Markov sources ,Cardinality ,Saddle point ,source discrimination ,generating functions ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Matrix analysis ,[MATH]Mathematics [math] ,Mellin transform ,Discrete mathematics ,Small data ,Markov chain ,Information Theory (cs.IT) ,String (computer science) ,suffix trees ,16. Peace & justice ,010201 computation theory & mathematics ,020201 artificial intelligence & image processing ,analytic information theory ,joint string complexity - Abstract
String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we introduce the joint string complexity as the cardinality of a set of words that are common to both strings. String complexity finds a number of applications from capturing the richness of a language to finding similarities between two genome sequences. In this paper we analyze the joint string complexity when both strings are generated by Markov sources. We prove that the joint string complexity grows linearly (in terms of the string lengths) when both sources are statistically indistinguishable and sublinearly when sources are statistically not the same. Precise analysis of the joint string complexity turns out to be quite challenging requiring subtle singularity analysis and saddle point method over infinity many saddle points leading to novel oscillatory phenomena with single and double periodicities. To overcome these challenges, we apply powerful analytic techniques such as multivariate generating functions, multivariate depoissonization and Mellin transform, spectral matrix analysis, and complex asymptotic methods.
- Published
- 2020