EVALUATING SPATIAL QUERIES OVER DECLUSTERED SPATIAL DATA

2019-08-02T19:06:39Z (GMT) by Eslam A Almorshdy

Due to the large volumes of spatial data, data is stored on clusters of machines that inter-communicate to achieve a task. In such distributed environment; communicating intermediate results among computing nodes dominates execution time. Communication overhead is even more dominant if processing is in memory. Moreover, the way spatial data is partitioned affects overall processing cost. Various partitioning strategies influence the size of the intermediate results. Spatial data poses the following additional challenges: 1)Storage load balancing because of the skewed distribution of spatial data over the underlying space, 2)Query load imbalance due to skewed query workload and query hotspots over both time and space, and 3)Lack of effective utilization of the computing resources. We introduce a new kNN query evaluation technique, termed BCDB, for evaluating nearest-neighbor queries (NN-queries, for short). In contrast to clustered partitioning of spatial data, BCDB explores the use of declustered partitioning of data to address data and query skew. BCDB uses summaries of the underling data and a coarse-grained index to localize processing of the NN-query on each local node as much as possible. The coarse-grained index is locally traversed using a new uncertain version of classical distance browsing resulting in minimal O( √k) elements to be communicated across all processing nodes.