Querying Druid in SQL with Superset

Querying Druid in SQL with Superset

Wednesday, June 20
11:00 AM - 11:40 AM
Executive Ballroom 210D/H

Druid is a high performance, column-oriented distributed data store that is widely used at Oath for big data analysis. Druid has a JSON schema as its query language, making it difficult for new users unfamiliar with the schema to start querying Druid quickly. The JSON schema is designed to work with the data ingestion methods of Druid, so it can provide high performance features such as data aggregations in JSON, but many are unable to utilize such features, because they not familiar with the specifics of how to optimize Druid queries. However, most new Druid users at Yahoo are already very familiar with SQL, and the queries they want to write for Druid can be converted to concise SQL.

We found that our data analysts wanted an easy way to issue ad-hoc Druid queries and view the results in a BI tool in a way that's presentable to nontechnical stakeholders. In order to achieve this, we had to bridge the gap between Druid, SQL, and our BI tools such as Apache Superset. In this talk, we will explore different ways to query a Druid datasource in SQL and discuss which methods were most appropriate for our use cases. We will also discuss our open source contributions so others can utilize our work.

SPEAKERS

Guruganesh Kotta
Software Dev Eng
Oath
I have worked as a software developer at Oath for over 4 years, and I'm currently a member of the audience data team. Our team builds data pipelines which process all user activity data across Oath. I have attended DataWorks Summit for the last several years and have presented talks at other conferences such as XLDB and Tech Pulse (yahoo's internal conference).
Junxian Wu
Software Engineer
Oath Inc.
I started worked with Druid and Druid supporting tool beginning from 2016. In 2017, it brought me a chance to build a SQL interface for Druid in order to allow more consumers to access Druid. Right now, I am still working closely with Druid and developing streaming data system with Druid. Looking forward to make more progress!