Architecting Query Compilers for Diverse Workloads
thesisposted on 10.06.2019 by Ruby Y Tahboub
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
To leverage modern hardware platforms to their fullest, more and more database systems embrace compilation of query plans to native code. In the research community, there is an ongoing debate about the best way to architect such query compilers. This is perceived to be a difficult task, requiring techniques fundamentally different from traditional interpreted query execution. In this dissertation, we contribute to this discussion by drawing attention to an old but underappreciated idea known as Futamura projections, which fundamentally link interpreters and compilers. Guided by this idea, we demonstrate that efficient query compilation can actually be very simple, using techniques that are no more difficult than writing a query interpreter in a high-level language. We first develop LB2: a high-level query compiler implemented in this style that is competitive with the best compiled query engines both in sequential and parallel execution on the standard TPC-H benchmark.
Query engines process a variety of data types and structures including text, spatial, graphs, etc. Several spatial and graph engines are implemented as extensions to relational query engines to leverage optimized memory, storage, and evaluation. Still, the performance of these extensions is often stymied by the interpretive nature of the underlying data management, generic data structures, and the need to execute domain-specific external libraries. On that basis, compiling spatial and graph queries to native code is a desirable avenue to mitigate existing limitations and improve performance. To support compiling spatial queries, we extend the LB2 main-memory query compiler with spatial predicates, indexing structures, and spatial operators. To support compiling graph queries, we extend LB2 with graph data structures and operators. The spatial extension matches the performance of hand-written code and outperforms relational query engines and map-reduce extensions. Similarly, the graph extension matches, and sometimes outperforms, low-level graph engines.