Asymptotic Analysis of the kth Subword Complexity
thesisposted on 02.08.2019 by Lida Ahmadi
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
The Subword Complexity of a character string refers to the number of distinct substrings of any length that occur as contiguous patterns in the string. The kth Subword Complexity in particular, refers to the number of distinct substrings of length k in a string of length n. In this work, we evaluate the expected value and the second factorial moment of the kth Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns' auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of k=a log n. In the proof, we compare the distribution of the kth Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie.
The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis.