Statistical Learning and Model Criticism for Networks and Point Processes
thesisposted on 16.08.2019 by Jiasen Yang
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
Networks and point processes provide flexible tools for representing and modeling complex dependencies in data arising from various social and physical domains. Graphs, or networks, encode relational dependencies between entities, while point processes characterize temporal or spatial interactions among events.
In the first part of this dissertation, we consider dynamic network data (such as communication networks) in which links connecting pairs of nodes appear continuously over time. We propose latent space point process models to capture two different aspects of the data: (i) communication occurs at a higher rate between individuals with similar latent attributes (i.e., homophily); and (ii) individuals tend to reciprocate communications from others, but in a varied manner. Our framework marries ideas from point process models, including Poisson and Hawkes processes, with ideas from latent space models of static networks. We evaluate our models on several real-world datasets and show that a dual latent space model, which accounts for heterogeneity in both homophily and reciprocity, significantly improves performance in various link prediction and network embedding tasks.
In the second part of this dissertation, we develop nonparametric goodness-of-fit tests for discrete distributions and point processes that contain intractable normalization constants, providing the first generally applicable and computationally feasible approaches under those circumstances. Specifically, we propose and characterize Stein operators for discrete distributions, and construct a general Stein operator for point processes using the Papangelou conditional intensity function. Based on the proposed Stein operators, we establish kernelized Stein discrepancy measures for discrete distributions and point processes, which enable us to develop nonparametric goodness-of-fit tests for un-normalized density/intensity functions. We apply the kernelized Stein discrepancy tests to discrete distributions (including network models) as well as temporal and spatial point processes. Our experiments demonstrate that the proposed tests typically outperform two-sample tests based on the maximum mean discrepancy, which, unlike our goodness-of-fit tests, assume the availability of exact samples from the null model.