10.25394/PGS.11319932.v1
Evidence S Matangi
Handling Complexity via Statistical Methods
2019
Purdue University Graduate School
complexity
bundled interventions
compliance
subsampling
simulation
atmospheric data
time series models
2019-12-05 14:53:02
article
https://hammer.figshare.com/articles/thesis/Handling_Complexity_via_Statistical_Methods/11319932
<p>Phenomena investigated from complex systems are
characteristically dynamic, multi-dimensional, and nonlinear. Their traits can be captured through data
generating mechanisms (<i>DGM</i>) that
explain the interactions among the systems’ components. Measurement is fundamental to advance science,
and complexity requires deviation from linear thinking to handle. Simplifying the measurement of complex and
heterogeneous data in statistical methodology can compromise their accuracy. In particular, conventional statistical methods
make assumptions on the DGM that are rarely met in real world, which can make inference
inaccurate. We posit that causal
inference for complex systems phenomena requires at least the incorporation of
subject-matter knowledge and use of dynamic metrics in statistical methods to improve
on its accuracy.</p>
<p>This thesis consists of two separate topics on handling data
and data generating mechanisms complexities, the evaluation of bundled
nutrition interventions and modeling atmospheric data.</p>
<p>Firstly, when a public health problem requires multiple ways
to address its contributing factors, bundling of the approaches can be cost-effective. Scaling up bundled interventions geographically
requires a hierarchical structure in implementation, with central coordination
and supervision of multiple sites and staff delivering a bundled intervention. The experimental design to evaluate such an
intervention becomes complex to accommodate the multiple intervention
components and hierarchical implementation structure. The components of a bundled intervention may
impact targeted outcomes additively or synergistically. However, noncompliance
and protocol deviation can impede this potential impact, and introduce data
complexities. We identify several statistical considerations and recommendations
for the implementation and evaluation of bundled interventions. </p>
<p>The simple aggregate metrics used in clustering randomized
controlled trials do not utilize all available information, and findings are
prone to the ecological fallacy problem, in which inference at the aggregate
level may not hold at the disaggregate level.
Further, implementation heterogeneity impedes statistical power and
consequently the accuracy of the inference from conventional comparison with a control
arm. The intention-to-treat analysis can be inadequate for bundled
interventions. We developed novel process-driven,
disaggregated participation metrics to examine the mechanisms of impact of the
Agriculture to Nutrition (ATONU) bundled intervention (ClinicalTrials.gov
Identifier: NCT03152227). Logistic and beta-logistic hierarchical models were
used to characterize these metrics, and generalized mixed models were employed
to identify determinants of the study outcome, dietary diversity for women of
reproductive age. Mediation analysis was
applied to explore the underlying determinants by which the intervention affects
the outcome through the process metrics. The determinants of greater participation
should be the targets to improve implementation of future bundled interventions.</p>
<p>Secondly, observed atmospheric records are often
prohibitively short with only one record typically available for study. Classical
nonlinear time series models applied to explain the nonlinear DGM exhibit some
statistical properties of the phenomena being investigated, but have nothing to
do with their physical properties. The data’s complex dependent structure
invalidates inference from classical time series models involving strong
statistical assumptions rarely met in real atmospheric and climate data. The subsampling method may yield valid statistical
inference. Atmospheric records, however, are typically too short to satisfy<i> </i>asymptotic conditions for the method’s
validity, which necessitates enhancements of subsampling with the use of
approximating models (those sharing statistical properties with the series
under study). </p>
<p>Gyrostat models (<i>G-models</i>)
are physically sound low-order models generated from the governing equations
for atmospheric dynamics thus retaining some of their fundamental statistical
and physical properties. We have demonstrated statistic that using G-models as
approximating models in place of traditional time series models results in more
precise subsampling confidence intervals with improved coverage probabilities.
Future works will explore other types of G-models as approximating models for
inference on atmospheric data. We will adopt this technique for inference on phenomena
for AstroStatistics and pharmacokinetics. </p>