On Transfer Learning Techniques for Machine Learning
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
Recent progress in machine learning has been mainly due to the availability of large amounts of annotated data used for training complex models with deep architectures. Annotating this training data becomes burdensome and creates a major bottleneck in maintaining machine-learning databases. Moreover, these trained models fail to generalize to new categories or new varieties of the same categories. This is because new categories or new varieties have data distribution different from the training data distribution. To tackle these problems, this thesis proposes to develop a family of transfer-learning techniques that can deal with different training (source) and testing (target) distributions with the assumption that the availability of annotated data is limited in the testing domain. This is done by using the auxiliary data-abundant source domain from which useful knowledge is transferred that can be applied to data-scarce target domain. This transferable knowledge serves as a prior that biases target-domain predictions and prevents the target-domain model from overfitting. Specifically, we explore structural priors that encode relational knowledge between different data entities, which provides more informative bias than traditional priors. The choice of the structural prior depends on the information availability and the similarity between the two domains. Depending on the domain similarity and the information availability, we divide the transfer learning problem into four major categories and propose different structural priors to solve each of these sub-problems.
This thesis first focuses on the unsupervised-domain-adaptation problem, where we propose to minimize domain discrepancy by transforming labeled source-domain data to be close to unlabeled target-domain data. For this problem, the categories remain the same across the two domains and hence we assume that the structural relationship between the source-domain samples is carried over to the target domain. Thus, graph or hyper-graph is constructed as the structural prior from both domains and a graph/hyper-graph matching formulation is used to transform samples in the source domain to be closer to samples in the target domain. An efficient optimization scheme is then proposed to tackle the time and memory inefficiencies associated with the matching problem. The few-shot learning problem is studied next, where we propose to transfer knowledge from source-domain categories containing abundantly labeled data to novel categories in the target domain that contains only few labeled data. The knowledge transfer biases the novel category predictions and prevents the model from overfitting. The knowledge is encoded using a neural-network-based prior that transforms a data sample to its corresponding class prototype. This neural network is trained from the source-domain data and applied to the target-domain data, where it transforms the few-shot samples to the novel-class prototypes for better recognition performance. The few-shot learning problem is then extended to the situation, where we do not have access to the source-domain data but only have access to the source-domain class prototypes. In this limited information setting, parametric neural-network-based priors would overfit to the source-class prototypes and hence we seek a non-parametric-based prior using manifolds. A piecewise linear manifold is used as a structural prior to fit the source-domain-class prototypes. This structure is extended to the target domain, where the novel-class prototypes are found by projecting the few-shot samples onto the manifold. Finally, the zero-shot learning problem is addressed, which is an extreme case of the few-shot learning problem where we do not have any labeled data in the target domain. However, we have high-level information for both the source and target domain categories in the form of semantic descriptors. We learn the relation between the sample space and the semantic space, using a regularized neural network so that classification of the novel categories can be carried out in a common representation space. This same neural network is then used in the target domain to relate the two spaces. In case we want to generate data for the novel categories in the target domain, we can use a constrained generative adversarial network instead of a traditional neural network. Thus, we use structural priors like graphs, neural networks and manifolds to relate various data entities like samples, prototypes and semantics for these different transfer learning sub-problems. We explore additional post-processing steps like pseudo-labeling, domain adaptation and calibration and enforce algorithmic and architectural constraints to further improve recognition performance. Experimental results on standard transfer learning image recognition datasets produced competitive results with respect to previous work. Further experimentation and analyses of these methods provided better understanding of machine learning as well.