Optimization Frameworks for Graph Clustering
2019-05-15T16:31:31Z (GMT) by
In graph theory and network analysis, communities or clusters are sets of nodes in a graph that share many internal connections with each other, but are only sparsely connected to nodes outside the set. Graph clustering, the computational task of detecting these communities, has been studied extensively due to its widespread applications and its theoretical richness as a mathematical problem. This thesis presents novel optimization tools for addressing two major challenges associated with graph clustering.
The first major challenge is that there already exists a plethora of algorithms and objective functions for graph clustering. The relationship between different methods is often unclear, and it can be very difficult to determine in practice which approach is the best to use for a specific application. To address this challenge, we introduce a generalized discrete optimization framework for graph clustering called LambdaCC, which relies on a single tunable parameter. The value of this parameter controls the balance between the internal density and external sparsity of clusters that are formed by optimizing an underlying objective function. LambdaCC unifies the landscape of graph clustering techniques, as a large number of previously developed approaches can be recovered as special cases for a fixed value of the LambdaCC input parameter.
The second major challenge of graph clustering is the computational intractability of detecting the best way to cluster a graph with respect to a given NP-hard objective function. To address this intractability, we present new optimization tools and results which apply to LambdaCC as well as a broader class of graph clustering problems. In particular, we develop polynomial time approximation algorithms for LambdaCC and other more generalized clustering objectives. In particular, we show how to obtain a polynomial-time 2-approximation for cluster deletion, which improves upon the previous best approximation factor of 3. We also present a new optimization framework for solving convex relaxations of NP-hard graph clustering problems, which are frequently used in the design of approximation algorithms. Finally, we develop a new framework for efficiently setting tunable parameters for graph clustering objective functions, so that practitioners can work with graph clustering techniques that are especially well suited to their application.