Powering TensorFlow with big data (Apache BEAM, Flink, and Spark)

Powering TensorFlow with big data (Apache BEAM, Flink, and Spark)

Wednesday, April 18
4:00 PM - 4:40 PM
Room III

TensorFlow is all kind of fancy, from helping startups raising their Series A in Silicon Valley to detecting if something is a cat. However, when things start to get “real,” you may find yourself no longer dealing with mnist.csv, and instead needing do large-scale data prep as well as training.

This talk will explore how TensorFlow can be used in conjunction with Apache BEAM, Flink, and Spark to create a full machine learning pipeline including those annoying “feature engineering” and “data prep” components that we like to pretend don’t exist. We’ll also talk about how these feature prep stages need to be integrated into the serving layer. In addition to Apache BEAM this talk also examines changing industry trends, like Apache Arrow, and how they impact cross-language development for things like deep learning. Even if you’re not trying to raise a round of funding in Silicon Valley, this talk will give you tools to do interesting machine learning problems at scale

Presentation Video


Holden Karau
Developer Advocate
Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a commiter on and PMC on Apache Spark and committer on SystemML & Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.