Transactional operations in Apache Hive: present and future

Transactional operations in Apache Hive: present and future

Wednesday, June 20
11:50 AM - 12:30 PM
Executive Ballroom 210A/E

Apache Hive is an enterprise data warehouse build on top of Hadoop. Hive supports insert, update, delete, and merge SQL operations with transactional semantics and read operations that run at snapshot isolation. The well defined semantics of these operations in the face of failure and concurrency are critical to building robust application on top of Apache Hive. In the past there were many preconditions to enabling these features which meant giving up other functionality. The need to make these tradeoffs is rapidly being eliminated.

This talk will describe the intended use cases, architecture of the implementation, recent improvements and new features build for Hive 3.0. For example, bucketing transactional tables, while supported, is no longer required. Performance overhead of using transactional tables is nearly eliminated relative to identical non-transactional tables. We’ll also cover Streaming Ingest API, which allows writing batches of events into a Hive table without using SQL.

Presentation Video


Eugene Koifman
Principal Software Engineer
Hortonworks inc
I'm a technical lead at Hortonworks where I concentrate on adding support for operations with transactional semantics to Apache Hive. Prior to that I was a lead engineer on a federated SQL engine at Composite Software. Before that I've held various engineering roles at BEA, Oracle and others.