Attack-Resilient Adaptive Load-Balancing in Distributed Spatial Data Streaming Systems
thesisposted on 05.08.2020 by Anas Hazim Daghistani
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
The proliferation of GPS-enabled devices has led to the development of numerous location-based services. These services need to process massive amounts of spatial data in real-time with high-throughput and low response time. The current scale of spatial data cannot be handled using centralized systems. This has led to the development of distributed spatial streaming systems. The performance of distributed streaming systems relies on how even the workload is distributed among their machines. However, the real-time streamed spatial data and query follow non-uniform spatial distributions that are continuously changing over time. Therefore, Distributed spatial streaming systems need to track the changes in the distribution of spatial data and queries and redistribute their workload accordingly. This thesis addresses the challenges of adapting to workload changes in distributed spatial streaming systems to improve the performance while preserving the system's security.
The thesis proposes TrioStat, an online workload estimation technique that relies on a probabilistic model for estimating the cost of partitions and machines of distributed spatial streaming systems. TrioStat has a decentralised technique to collect and maintain the required statistics in real-time with minimal overhead. In addition, this thesis introduces SWARM, a light-weight adaptive load-balancing protocol that continuously monitors the data and query workloads across the distributed processes of spatial data streaming systems, and redistribute the workloads soon as performance bottlenecks get detected. SWARM uses TrioStat to estimate the workload of the system's machines. Although using adaptive load-balancing techniques significantly improves the performance of distributed streaming systems, they make the system vulnerable to attacks. In this thesis, we introduce a novel attack model that targets adaptive load-balancing mechanisms of distributed streaming systems. The attack reduces the throughput and the availability of the system by making it stay in a continuous state of rebalancing. The thesis proposes Guard, a component that detects and blocks attacks that target the adaptive load balancing of distributed streaming systems. Guard is deployed in SWARM to develop an attack-resilient adaptive load balancing mechanism for Distributed spatial streaming systems.