Presto query optimizer: pursuit of performance

Presto query optimizer: pursuit of performance

Thursday, June 21
12:20 PM - 1:00 PM
Meeting Room 211A/B/C/D

Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Facebook, Airbnb, Netflix, Uber, Twitter, Bloomberg, and FINRA, Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments in the last few years.

Inspired by the increasingly complex SQL queries run by the Presto user community, engineers at Facebook and Starburst have recently focused on cost-based query optimization. In this talk we will present the initial design and implementation of the CBO, support for connector-provided statistics, estimating selectivity, and choosing efficient query plans. Then, our detailed experimental evaluation will illustrate the performance gains for several classes of queries achieved thanks to the optimizer. Finally, we will discuss our future work enhancing the initial CBO and present the general Presto roadmap for 2018 and beyond.


Kamil Bajda-Pawlikowski
CTO & co-founder
Kamil is a technology leader in the large scale data warehousing and analytics space. He is CTO of Starburst, the enterprise Presto company. Prior to co-founding Starburst, Kamil was the Chief Architect at the Teradata Center for Hadoop in Boston, focusing on the open source SQL engine Presto. Previously, he was the co-founder and chief software architect of Hadapt, the first SQL-on-Hadoop company, acquired by Teradata in 2014. Kamil began his journey with Hadoop and modern MPP SQL architectures about 10 years ago during a doctoral program at Yale University where he co-invented HadoopDB, the original foundation of Hadapt’s technology. Kamil holds an M.S. in Computer Science from Wroclaw University of Technology and as well as M.S. and an M.Phil. in Computer Science from Yale University.
Martin Traverso