Graph-structured data is pervasive. Modeling large-scale network-structured datasets require graph processing and management systems such as graph databases. Further, the analysis of graph-structured data often necessitates bulk downloads/uploads from/to the cloud or edge nodes. Unfortunately, experience has shown that malicious actors can compromise the confidentiality of highly-sensitive data stored in the cloud or shared nodes, even in an encrypted form. For particular use cases —multi-modal knowledge graphs, electronic health records, finance— network-structured datasets can be highly sensitive and require auditability, authentication, integrity protection, and privacy-preserving computation in a controlled and trusted environment, i.e., the traditional cloud computation is not suitable for these use cases. Similarly, many modern applications utilize a "shared, replicated database" approach to provide accountability and traceability. Those applications often suffer from significant privacy issues because every node in the network can access a copy of relevant contract code and data to guarantee the integrity of transactions and reach consensus, even in the presence of malicious actors.
This dissertation proposes breaking from the traditional cloud computation model, and instead ship certified pre-approved trusted code closer to the data to protect graph-structured data confidentiality. Further, our technique runs in a controlled environment in a trusted data owner node and provides proof of correct code execution. This computation can be audited in the future and provides the building block to automate a variety of real use cases that require preserving data ownership. This project utilizes trusted execution environments (TEEs) but does not rely solely on TEE's architecture to provide privacy for data and code. We thoughtfully examine the drawbacks of using trusted execution environments in cloud environments. Similarly, we analyze the privacy challenges exposed by the use of blockchain technologies to provide accountability and traceability.
First, we propose AGAPECert, an Auditable, Generalized, Automated, Privacy-Enabling, Certification framework capable of performing auditable computation on private graph-structured data and reporting real-time aggregate certification status without disclosing underlying private graph-structured data. AGAPECert utilizes a novel mix of trusted execution environments, blockchain technologies, and a real-time graph-based API standard to provide automated, oblivious, and auditable certification. This dissertation includes the invention of two core concepts that provide accountability, data provenance, and automation for the certification process: Oblivious Smart Contracts and Private Automated Certifications. Second, we contribute an auditable and integrity-preserving graph processing model called AuditGraph.io. AuditGraph.io utilizes a unique block-based layout and a multi-modal knowledge graph, potentially improving access locality, encryption, and integrity of highly-sensitive graph-structured data. Third, we contribute a unique data store and compute engine that facilitates the analysis and presentation of graph-structured data, i.e., TruenoDB. TruenoDB offers better throughput than the state-of-the-art. Finally, this dissertation proposes integrity-preserving streaming frameworks at the edge of the network with a personalized graph-based object lookup.
Degree TypeDoctor of Philosophy
Campus locationWest Lafayette
Advisor/Supervisor/Committee ChairBharat Bhargava
Additional Committee Member 2Jeremiah Blocki
Additional Committee Member 3Chunyi Peng
Additional Committee Member 4Xiangyu Zhang