Two popular open source technologies, Druid and Apache Hive, are often mentioned as viable solutions for large-scale analytics. Hive works well for storing large volumes of data, although not optimized for ingesting streaming data and making it available for queries in real time. Druid excels at low-latency, interactive queries over streaming data and making data available in real time for queries. Although the high level messaging presented by both projects may lead you to believe they are competitors in the same space, the technologies are, in fact, extremely complementary solutions.
By combining the rich query capabilities of Hive with the powerful realtime streaming and indexing capabilities of Druid, we can build a more powerful, flexible, and extremely low-latency real time analytics solution. In this talk we will discuss the motivation to combine Hive and Druid together alongwith the benefits and benchmark numbers.
Proposed Agenda of the talk:
• Motivation behind combining Druid and Hive
• Apache Hive—introduction
• Druid and Hive together—benefits
• Architecture for handling streaming data at scale
• Benchmark numbers