EFFECTIVE AND EFFICIENT COMPUTATION SYSTEM PROVENANCE TRACKING
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
Provenance collection and analysis is one of the most important techniques used in analyzing computation system behaviors. For forensic analysis in enterprise environment, existing provenance systems are limited. On one hand, they tend to log many redundant and irrelevant events causing high runtime and space overhead as well as long investigation time. On the other hand, they lack the application specific provenance data, leading to ineffective investigation process. Moreover, emerging machine learning especially deep learning based artificial intelligence systems are hard to interpret and vulnerable to adversarial attacks. Using provenance information to analyze such systems and defend adversarial attacks is potentially very promising but not well-studied yet.
In this dissertation, I try to address the aforementioned challenges. I present an effective and efficient operating system level provenance data collector, ProTracer. It features the idea of alternating between logging and tainting to perform on-the-fly log filtering and reduction to achieve low runtime and storage overhead. Tainting is used to track the dependence relationships between system call events, and logging is performed only when useful dependencies are detected. I also develop MPI, an LLVM based analysis and instrumentation framework which automatically transfers existing applications to be provenance-aware. It requires the programmers to annotate the desired data structures used for partitioning, and then instruments the program to actively emit application specific semantics to provenance collectors which can be used for multiple perspective attack investigation. In the end, I propose a new technique named NIC, a provenance collection and analysis technique for deep learning systems. It analyzes deep learning system internal variables to generate system invariants as provenance for such systems, which can be then used to as a general way to detect adversarial attacks.