Apache Hive is an enterprise data warehouse build on top of Hadoop. Hive supports insert, update, delete, and merge SQL operations with transactional semantics and read operations that run at snapshot isolation. The well defined semantics of these operations in the face of failure and concurrency are critical to building robust application on top of Apache Hive. In the past there were many preconditions to enabling these features which meant giving up other functionality. The need to make these tradeoffs is rapidly being eliminated.
This talk will describe the intended use cases, architecture of the implementation, recent improvements and new features build for Hive 3.0. For example, bucketing transactional tables, while supported, is no longer required. Performance overhead of using transactional tables is nearly eliminated relative to identical non-transactional tables. We’ll also cover Streaming Ingest API, which allows writing batches of events into a Hive table without using SQL.