Accelerating query processing with materialized views in Apache Hive

Accelerating query processing with materialized views in Apache Hive

Tuesday, June 19
2:50 PM - 3:30 PM
Executive Ballroom 210A/E

Over the last few years, the Apache Hive community has been working on advancements to enable a full new range of use cases for the project, moving from its batch processing roots towards a SQL interactive query answering platform. Traditionally, one of the most powerful techniques used to accelerate query processing in data warehouses is the pre-computation of relevant summaries or materialized views.

This talk presents our work on introducing materialized views and automatic query rewriting based on those materializations in Apache Hive. In particular, materialized views can be stored natively in Hive or in other systems such as Druid using custom storage handlers, and they can seamlessly exploit new exciting Hive features such as LLAP acceleration. Then the optimizer relies in Apache Calcite to automatically produce full and partial rewritings for a large set of query expressions comprising projections, filters, join, and aggregation operations. We shall describe the current coverage of the rewriting algorithm, how Hive controls important aspects of the life cycle of the materialized views such as the freshness of their data, and outline interesting directions for future improvements.

Presentation Video


Jesus Camacho Rodriguez
Member of Technical Staff
Jesús Camacho Rodríguez is a Member of Technical Staff at Hortonworks, and a PMC member of Apache Hive and Apache Calcite. His current work focuses on extending and improving query processing and optimization, ensuring that the increasingly complex workloads supported by Hive are executed quickly and efficiently. Prior to that, Jesús obtained his PhD in Computer Science from Paris-Sud University and Inria, working on large-scale Web data management. Jesús received his Computer Science and Engineering degree from University of Almería, Spain.