An Early Evaluation of Running Spark on Kubernetes

An Early Evaluation of Running Spark on Kubernetes

Thursday, March 21
2:00 PM - 2:40 PM
Room 118-119

Kubernetes is an open source system to deploy, scale, and manage containerized applications anywhere. It builds on 15 years of running Google's containerized workloads and the valuable contributions from the open source community. To shepherd Kubernetes' evolution with the open source community, Google helped form the Cloud Native Computing Foundation (CNCF) and donated Kubernetes as the founding project. Starting in Spark 2.3.0, Spark has an experimental option to run clusters managed by Kubernetes. This feature makes use of the native Kubernetes scheduler that has been added to Spark. In this talk, we will provide a baseline understanding of what Kubernetes is, why it is relevant for the Spark community and how it compares to YARN. We will then look under the hood of Spark managed by Kubernetes to better understand how this works. Finally, we provide an early evaluation of this feature as well as our thoughts on the future of running Spark on Kubernetes.

Presentation Video


Christopher Crosbie
Product Manager, Dataproc and Open Data Analytics (ODA)
Christopher Crosbie has over fifteen years of experience developing and deploying data technology in enterprise environments. He is currently on the Cloud Partner Engineering team at Google where he serves a trusted advisor to software vendors that build Data, Analytics and ML solutions on the Google Cloud platform. Previous to joining Google, Chris was a development manager at Amazon and before that he headed up the data science team at Memorial Sloan Kettering Cancer Center where he implemented the enterprise Hortonworks architecture and strategy. Chris started his career as a biostatistics application engineer at the NSABP, a not-for-profit clinical trials cooperative group supported by the National Cancer Institute. He holds an MPH in Biostatistics and an MS in Information Science.