10.25394/PGS.7499378.v1 Min Ren Min Ren Model-Based High-Dimensional Network Inference: Theory & Methods Purdue University Graduate School 2019 differential network gene regulatory network structural equation model high-dimesnsional data Statistics 2019-01-03 18:57:43 Thesis https://hammer.purdue.edu/articles/thesis/Model-Based_High-Dimensional_Network_Inference_Theory_Methods/7499378 <div>In the past several decades, the advent of high-throughput biotechnologies for genomics study provides appealing opportunities for us to understand the complex gene interaction inside biological systems, attracting many researches in constructing gene regulatory networks (GRNs). Motivated by the promise of the genetical genomics</div><div>study, our research group has recently focused on representing gene regulatory networks using structural equation models and further revealing system-wide gene regulations.This dissertation presents two recent works along this direction.</div><div><br></div><div><div>Firstly, we conducted thorough theoretical analysis of the recently proposed Two-Stage Penalized Least Squares (2SPLS) method for constructing large systems of structural equation models. We establish the estimation and prediction error bounds for results at both stages of 2SPLS as well as its variable selection consistency. Speci cally, a bounded eigenvalue assumption is imposed to ensure the consistency properties of the <sup>l</sup>2-penalized regressions at the first stage. At the second stage, the estimation and</div><div>variable selection consistency of the <sup>l</sup>1-penalized regressions are obtained by assuming a restricted eigenvalue condition and a variant of irrepresentable condition, which are both commonly employed in the current literature. We will show that the 2SPLS estimator works not only for fi xed dimensions but also diverging dimensions which can grow to infi nity with the sample size but at an appropriate rate.</div></div><div><br></div><div><div>Secondly, we developed a novel statistical method to identify structural differences between two cognate networks characterized by structural equation models. We</div><div>propose to reparameterize the model to separate the differential structures from common structures, and then design an algorithm with calibration and construction stages to identify these differential structures directly. The calibration stage serves to obtain consistent prediction by building the<sup> l</sup>2 regularized regression of each endogenous</div><div>variables against pre-screened exogenous variables, correcting for potential endogeneity issue. The construction stage consistently selects and estimates both common and</div><div>differential effects by undertaking <sup>l</sup>1 regularized regression of each endogenous variable against the predicts of other endogenous variables as well as its anchoring exogenous</div><div>variables. Our method allows for easy parallel computation. Theoretical results are obtained to establish non-asymptotic error bounds of predictions and estimates at both stages. Our studies on simulated data demonstrated that the proposed method performed much better than independently constructing networks. A real data set</div><div>was analyzed to illustrate the applicability of our method.</div></div>