Fast SQL on Hadoop, really?

Thursday, October 11
2:00 PM - 2:40 PM
Hullet

How is it that one system can query terabytes of data, yet still provide interactive query support? This talk will discuss two of the underlying technologies that allow Apache Hive to support fast query response, both on-premise in HDFS and in cloud object stores such as S3 and WASB.

LLAP was introduced in Hive 2.6. It provides standing processes that securely cache Hive’s columnar data and can do query processing without ever needing to start tasks in Hadoop. We will cover LLAP’s architecture, intended uses cases, and performance numbers for both on-premise and in the cloud.

The second technology is the integration of Hive with Apache Druid. Druid excels at low-latency, interactive queries over streaming data. Its method of storing data makes it very well suited for OLAP style queries. We will cover how Hive can be integrated with Druid to support real-time streaming of data from Kafka and OLAP queries.

SPEAKERS

Alan Gates
Co-founder
Hortonworks
Alan is a founder of Hortonworks and an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan is PMC member on Apache Hive, Pig, and many other Apache projects. As part of the Apache Incubator PMC he has mentored many new Apache communities. Alan has a BS in Mathematics from Oregon State University and a MA in Theology from Fuller Theological Seminary. He is also the author of Programming Pig, a book from O’Reilly Press. Follow Alan on Twitter: @alanfgates.