IMBALANCED HIGH DIMENSIONAL CLASSIFICATION AND APPLICATIONS IN PRECISION MEDICINE
thesisposted on 14.05.2019 by Hui Sun
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
Classification is an important supervised learning technique with numerous applications. This dissertation addresses two research problems in this area. The first is multicategory classification methods for high dimensional data. To handle high dimension low sample size (HDLSS) data with uneven group sizes (i.e., imbalanced data), we develop a new classification method called angle-based multicategory distance-weighted support vector machine (MDWSVM). It is motivated from its binary counterpart and has the merits of both the support vector machine (SVM) and distance-weighted discrimination (DWD) methods while alleviating both the data piling issue of SVM and the imbalanced data issue of DWD. Theoretical results and numerical studies are used to demonstrate the advantages of our MDWSVM method over existing methods.
The second part of the dissertation is on the application of classification methods to precision medicine problems. Because one-stage precision medicine problems can be reformulated as weighted classification problems, the subtle differences between classification methods may lead to different application performances under this setting. Among the margin-based classification methods, we propose to use the distance weighted discrimination outcome weighted learning (DWD-OWL) method. We also extend the model to handle negative rewards for better generality and apply the angle-based idea to handle multiple treatments. The proofs of Fisher consistency for DWD-OWL in both the binary and multicategory cases are provided. Under mild conditions, the insensitivity of DWD-OWL for imbalanced setting is also demonstrated.