When interacting with analytics dashboards, in order to achieve a smooth user experience, two major key requirements are quick response time and data freshness. To meet the requirements of creating fast interactive BI dashboards over streaming data, organizations often struggle with selecting a proper serving layer.
Cluster computing frameworks such as Hadoop or Spark work well for storing large volumes of data, although they are not optimized for making it available for queries in real time. Long query latencies also make these systems suboptimal choices for powering interactive dashboards and BI use cases.
This talk presents an open source real-time data analytics stack using Apache Kafka, Druid, and Superset. The stack combines the low-latency streaming and processing capabilities of Kafka with Druid, which enables immediate exploration and provides low-latency queries over the ingested data streams. Superset provides the visualization and dashboarding that integrates nicely with Druid. In this talk we will discuss why this architecture is well suited to interactive applications over streaming data, present an end-to-end demo of complete stack, discuss its key features, and discuss performance characteristics from real-world use cases.