What’s New in Apache Spark 2.3 and 2.4

What’s New in Apache Spark 2.3 and 2.4

Wednesday, February 6
11:50 AM - 12:30 PM
Room 111/112

Apache Spark 2.0 set the architectural foundations of structure in Spark, unified high-level APIs, structured streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2.

Apache Spark 2.3 & 2.4 has made similar strides too. In this talk, we want to highlight some of the new features and enhancements, such as:

• Apache Spark and Kubernetes
• Native Vectorized ORC and SQL Cache Readers
• Pandas UDFs for PySpark
• Continuous Stream Processing
• Barrier Execution
• Avro/Image Data Source
• Higher-order Functions


Robert Hryniewicz
AI Evangelist
Robert is an AI evangelist at Cloudera and has over 12 years of experience working on various projects related to Artificial Intelligence, Robotics, IoT, Enterprise & Embedded Software. His primary focus at Cloudera is building communities around IoT, Big Data and Data Science, and enabling Enterprises to accelerate adoption of cutting edge open-source technologies (from Edge to AI).