Procella: A fast versatile SQL query engine powering data at Youtube

Tuesday, June 19
11:00 AM - 11:40 AM
Grand Ballroom 220B

Procella is a distributed SQL query engine built for flexible workloads within YouTube. Procella is highly scalable and is designed primarily to serve high volumes of queries at low latencies while ingesting realtime data. It is used to serve video/channel statistics for users watching videos as well as OLAP style queries for video analytics (youtube.com/analytics) and public dashboards (artists.youtube.com). Procella also supports complex SQL operations over structured data and is used by YouTube analysts for ad-hoc analysis.

Procella works on the Google distributed computing stack working directly on data residing in accessible columnar formats on the Google distributed file system Colossus. The underlying data is thus producible and directly consumable by other tools such as MapReduce and Dremel. The compute runs directly on shared machines on Borg clusters, and does not need dedicated virtual (or physical) machines. These features allows Procella to fit nicely in the Google ecosystem, scale compute and storage independently, and to gracefully handle evictions and machine failures without compromising availability or performance.

Procella has been in production for over two years and is currently serving billions of SQL queries per day across various workloads at YouTube and several other Google product areas.

Presentation Video

SPEAKERS

Aniket Mokashi
Senior Software Engineer
Google
Aniket Mokashi is a tech lead on the engineering team that prototyped and built Procella project at Youtube. Throughout his career, he has contributed to development of large scale data processing frameworks and platforms. Prior to Google, he has worked on data platform teams at Twitter and Netflix. He is also a committer and PMC member on Apache Parquet and Apache Pig projects. Aniket holds a Master's degree in Information Networking from Carnegie Mellon University.