Exploring Node Attributes for Data Mining in Attributed Graphs
thesisposted on 10.06.2019 by Jihwan Lee
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
Graphs have attracted researchers in various fields in that many different kinds of real-world entities and relationships between them can be represented and analyzed effectively and efficiently using graphs. In particular, researchers in data mining and machine learning areas have developed algorithms and models to understand the complex graph data better and perform various data mining tasks. While a large body of work exists on graph mining, most existing work does not fully exploit attributes attached to graph nodes or edges.
In this dissertation, we exploit node attributes to generate better solutions to several graph data mining problems addressed in the literature. First, we introduce the notion of statistically significant attribute associations in attribute graphs and propose an effective and efficient algorithm to discover those associations. The effectiveness analysis on the results shows that our proposed algorithm can reveal insightful attribute associations that cannot be identified using the earlier methods focused solely on frequency. Second, we build a probabilistic generative model for observed attributed graphs. Under the assumption that there exist hidden communities behind nodes in a graph, we adopt the idea of latent topic distributions to model a generative process of node attribute values and link structure more precisely. This model can be used to detect hidden communities and profile missing attribute values. Lastly, we investigate how to employ node attributes to learn latent representations of nodes in lower dimensional embedding spaces and use the learned representations to improve the performance of data mining tasks over attributed graphs.