Procella is a distributed SQL query engine built for flexible workloads within YouTube. Procella is highly scalable and is designed primarily to serve high volumes of queries at low latencies while ingesting realtime data. It is used to serve video/channel statistics for users watching videos as well as OLAP style queries for video analytics (youtube.com/analytics) and public dashboards (artists.youtube.com). Procella also supports complex SQL operations over structured data and is used by YouTube analysts for ad-hoc analysis.
Procella works on the Google distributed computing stack working directly on data residing in accessible columnar formats on the Google distributed file system Colossus. The underlying data is thus producible and directly consumable by other tools such as MapReduce and Dremel. The compute runs directly on shared machines on Borg clusters, and does not need dedicated virtual (or physical) machines. These features allows Procella to fit nicely in the Google ecosystem, scale compute and storage independently, and to gracefully handle evictions and machine failures without compromising availability or performance.
Procella has been in production for over two years and is currently serving billions of SQL queries per day across various workloads at YouTube and several other Google product areas.