Druid and Hive together: interactive realtime analytics at scale

Druid and Hive together: interactive realtime analytics at scale

Tuesday, June 19
4:50 PM - 5:30 PM
Grand Ballroom 220B

Two popular open source technologies, Druid and Apache Hive, are often mentioned as viable solutions for large-scale analytics. Hive works well for storing large volumes of data, although not optimized for ingesting streaming data and making it available for queries in real time. Druid excels at low-latency, interactive queries over streaming data and making data available in real time for queries. Although the high level messaging presented by both projects may lead you to believe they are competitors in the same space, the technologies are, in fact, extremely complementary solutions.

By combining the rich query capabilities of Hive with the powerful realtime streaming and indexing capabilities of Druid, we can build a more powerful, flexible, and extremely low-latency real time analytics solution. In this talk we will discuss the motivation to combine Hive and Druid together alongwith the benefits and benchmark numbers.

Proposed Agenda of the talk:
• Motivation behind combining Druid and Hive
• Apache Hive—introduction
• Druid—introduction
• Druid and Hive together—benefits
• Architecture for handling streaming data at scale
• Demo
• Benchmark numbers

Presentation Video


Nishant Bangarwa
Software engineer
Nishant is Druid PMC member and Software Engineer at Hortonworks. He is part of Business Intelligence team at Hortonworks. Prior to that he was part of Metamarkets backend team and was responsible for analytics infrastructure, including real-time analytics in Druid. He holds a B.Tech in Computer Science from National Institute of Technology, Kurukshetra, India.