SQL is the most widely used language for data processing. It allows users to concisely and easily declare their business logic. Data analysts usually do not have complex software programing backgrounds, but they can program SQL and use it on a regular basis to analyze data and power the business decisions. Apache Flink is one of streaming engines that supports SQL. Besides Flink, some other stream processing frameworks, like Kafka and Spark structured streaming, have SQL-like DSL, but they do not have the same semantics as Flink. Flink’s SQL implementation follows ANSI SQL standard while others do not.
In this talk, we will present why following ANSI SQL standard is essential characteristic of Flink SQL and how we achieved this. The core business of Alibaba is now fully driven by the data processing engine: Blink, a project based on Flink with Alibaba’s improvements. About 90% of blink jobs are written by Flink SQL. We will show the use cases and the experience of running large scale Flink SQL jobs at Alibaba in the talk.