LidaAhmadi.pdf (551.68 kB)

Asymptotic Analysis of the kth Subword Complexity

Download (551.68 kB)
posted on 02.08.2019 by Lida Ahmadi
The Subword Complexity of a character string refers to the number of distinct substrings of any length that occur as contiguous patterns in the string. The kth Subword Complexity in particular, refers to the number of distinct substrings of length k in a string of length n. In this work, we evaluate the expected value and the second factorial moment of the kth Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns' auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of k=a log n. In the proof, we compare the distribution of the kth Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie.
The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis.


Degree Type

Doctor of Philosophy



Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Dr. Mark Daniel Ward

Advisor/Supervisor/Committee co-chair

Dr. Steve Bell

Additional Committee Member 2

Dr. Wojciech Szpankowski

Additional Committee Member 3

Dr. Aaron Yip