The convergence of reporting and interactive BI on Hadoop

The convergence of reporting and interactive BI on Hadoop

Tuesday, June 19
2:50 PM - 3:30 PM
Executive Ballroom 210C/G

Since the early days of Hive, SQL on Hadoop has evolved from being a SQL wrapper on top of MapReduce to a viable replacement for the traditional EDW. In the meantime, while SQL-on-Hadoop vendors were busy adding enterprise capabilities and comparing their TPC-DS prowess against Hive, a niche industry emerged on the side for OLAP (a.k.a. “Interactive BI”) on Hadoop data. Unlike general-purpose SQL-on-Hadoop engines, which deal with the multiple aspects of warehousing, including reporting, OLAP-on-Hadoop engines focus almost exclusively on answering OLAP queries fast by using implementation techniques that had not been part of the SQL-on-Hadoop toolbox so far.

But SQL-on-Hadoop engines are not standing still. After having made huge progress in catching up to traditional EDWs for reporting workloads, SQL-on-Hadoop engines are now setting their sights on interactive BI. This is great news for enterprises. As the line between reporting and OLAP gets blurred, enterprises can now start considering using a single engine for both reporting and Interactive BI on their Hadoop data, as opposed to having to host, manage, and license two separate products.

Can a single engine satisfy both your reporting and Interactive BI needs? This may be a hard question to answer. Vendors use inconsistent terminology to describe their products and make ambitious and sometimes conflicting claims. This makes it very hard for enterprises to compare products, let alone decide which is the product that best matches their needs.

In this presentation, we’ll provide an overview of the different approaches to OLAP on Hadoop, and explain the key technologies behind each of them. We’ll use consistent terminology to describe what you get from multiple proprietary and open source products and outline advantages and disadvantages. You’ll come out equipped with the knowledge you need to read past marketing and sales pitches. You’ll be able to compare products and make an informed decision on whether a single engine for both reporting and Interactive BI on Hadoop is right for you.


Gustavo Arocena
Big Data Architect
Gustavo Arocena is a Big Data Architect at the IBM Toronto Lab, with over 15 years of experience in database technology. Recently, Gustavo lead the design and implementation of several components of the Big SQL engine, including the Hive-compatible IO layer, the INSERT statement, the integration with Apache Spark and the high-performance ORC ingestion layer. Gustavo has several publications and has presented at multiple conferences. He holds a Master's degree in Computer Science from the University of Toronto in the area of database language processing.