Overview

Many organizations analyze large datasets from multiple viewpoints to derive insights and build predictive models that drive application logic. The landscape of modern data analysis is changing, with the emergence of new compute substrates and data to be analyzed now spread across many geographic locations. Unfortunately, existing analysis stacks’ rigid layering and inflexible APIs prevent them from effectively supporting analytics in the face of these changes. For new computing substrates, e.g., edge computing, geo-distributed clusters, and spot markets, we argue that analytics stacks are most effective when their APIs intrinsically accommodate the unique resource variability properties of such settings.

Iridium

Iridium is an execution framework for running GDA jobs. It primarily controls task placement in a network aware fashion.

Paper
Iridium: Low Latency Geo-distributed Data Analytics, SIGCOMM 2015
Paper Abstract

Low latency analytics on geographically distributed datasets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single datacenter significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics frameworks also leads to high query response times because these frameworks cannot cope with the relatively low and variable capacity of WAN links. We present Iridium, a system for low latency geo-distributed analytics. Iridium achieves low query response times by optimizing placement of both data and tasks of the queries. The joint data and task placement optimization, however, is intractable. Therefore, Iridium uses an online heuristic to redistribute datasets among the sites prior to queries' arrivals, and places the tasks to reduce network bottlenecks during the query's execution. Finally, it also contains a knob to budget WAN usage. Evaluation across eight worldwide EC2 regions using production queries show that Iridium speeds up queries by 3x - 19x and lowers WAN usage by 15% - 64% compared to existing baselines.

Clarinet

Clarinet builds on Iridum by extending network awareness to the query optimizer layer for SQL styles analytics frameworks. Existing query optimizers (e.g., Calcite, Catalyst, etc.) cannot factor the heterogeneous nature of inter-DC WAN bandwidth. Clarinet generates WAN bandwidth aware query plans for multiple jobs in a GDA framework.

Paper
Clarinet: WAN-Aware Optimization for Analytics Queries, OSDI 2016
Paper Abstract

Recent work has made the case for geo-distributed analytics, where data collected and stored at multiple datacenters and edge sites world-wide is analyzed in situ to drive operational and management decisions. A key issue in such systems is ensuring low response times for analytics queries issued against geo-distributed data. A central determinant of response time is the query execution plan (QEP). Current query optimizers do not consider the network when deriving QEPs, which is a key drawback as the geo-distributed sites are connected via WAN links with heterogeneous and modest bandwidths, unlike intra-datacenter networks. We propose CLARINET, a novel WAN-aware query optimizer. Deriving a WAN-aware QEP requires working jointly with the execution layer of analytics frameworks that places tasks to sites and performs scheduling. We design efficient heuristic solutions in CLARINET to make such a joint decision on the QEP. Our experiments with a real prototype deployed across EC2 datacenters, and large-scale simulations using production workloads show that CLARINET improves query response times by ≥ 50% compared to state-of-the-art WAN-aware task placement and scheduling.

Talks

Slides and recording of the talk available at conference page.

Code

Please e-mail the first author for access to code.

QOOP

QOOP builds on Clarinet and argues for dynamic query re-planning. While Clarinet uses a fixed Query Execution Plan (QEP) that is optimal at the start of query execution, it can cease to be optimal when the resources available to the query change significantly. To do so we make changes to the query planner, the execution engine, and the scheduler. With QOOP, we re-evaluate and re-plan the job's QEP in a greedy manner and in doing so push complexity up the stack in the execution engine and the query planner instead of pushing more complexity into the scheduler.

Paper
Dynamic Query Re-planning Using QOOP, OSDI 2018
Paper Abstract

Modern data processing clusters are highly dynamic – both in terms of the number of concurrently running jobs and their resource usage. To improve job performance, recent works have focused on optimizing the cluster scheduler and the jobs’ query planner with a focus on picking the right query execution plan (QEP) – represented as a directed acyclic graph – for a job in a resource-aware manner, and scheduling jobs in a QEP-aware manner. However, because existing solutions use a fixed QEP throughout the entire execution, the inability to adapt a QEP in reaction to resource changes often leads to large performance inefficiencies. This paper argues for dynamic query re-planning, wherein we re-evaluate and re-plan a job’s QEP during its execution. We show that designing for re-planning requires fundamental changes to the interfaces between key layers of data analytics stacks today, i.e., the query planner, the execution engine, and the cluster scheduler. Instead of pushing more complexity into the scheduler or the query planner, we argue for a redistribution of responsibilities between the three components to simplify their designs. Under this redesign, we analytically show that a greedy algorithm for re-planning and execution alongside a simple max-min fair scheduler can offer provably competitive behavior even under adversarial resource changes. We prototype our algorithms atop Apache Hive and Tez. Via extensive experiments, we show that our design can offer a median performance improvement of 1.47× compared to state-of-the-art alternatives.

Talks

Slides and recording of the talk available at conference page.

Code

Please e-mail the first author for access to code.