## SCALABLE REPRESENTATION LEARNING WITH INVARIANCES

#### thesis

In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.

In many complex domains, the input data are often not suited for the typical vector representations used in deep learning models. For example, in knowledge representation, relational learning, and some computer vision tasks, the data are often better represented as graphs or sets. In these cases, a key challenge is to learn a representation function which is invariant to permutations of set or isomorphism of graphs.

In order to handle graph isomorphism, this thesis proposes a subgraph pattern neural network with invariance to graph isomorphisms and varying local neighborhood sizes. Our key insight is to incorporate the unavoidable dependencies in the training observations of induced subgraphs into both the input features and the model architecture itself via high-order dependencies, which are still able to take node/edge labels into account and facilitate inductive reasoning.

In order to learn permutation-invariant set functions, this thesis shows how the characteristics of an architecture’s computational graph impact its ability to learn in contexts with complex set dependencies, and demonstrate limitations of current methods with respect to one or more of these complexity dimensions. I also propose a new Self-Attention GRU architecture, with a computation graph that is built automatically via self-attention to minimize average interaction path lengths between set elements in the architecture’s computation graph, in order to effectively capture complex dependencies between set elements.

Besides the typical set problem, a new problem of representing sets-of-sets (SoS) is proposed. In this problem, multi-level dependence and multi-level permutation invariance need to be handled jointly. To address this, I propose a hierarchical sequence attention framework (HATS) for inductive set-of-sets embeddings, and develop the stochastic optimization and inference methods required for efficient learning.