Lessons learned from running Spark on Docker

Lessons learned from running Spark on Docker

Thursday, April 19
4:50 PM - 5:30 PM
Convention Hall I - C

Today, most any application can be “Dockerized.” However, there are special challenges when deploying a distributed application such as Spark on containers. This session will describe how to overcome these challenges in deploying Spark on Docker containers, with many practical tips and techniques for running Spark in a container environment.

Containers are typically used to run stateless applications on a single host. There are significant real-world enterprise requirements that need to be addressed when running a stateful, distributed application in a secure multi-host container environment.

There are decisions that need to be made concerning which tools and infrastructure to use. There are many choices with respect to container managers, orchestration frameworks, and resource schedulers that are readily available today and some that may be available tomorrow including:]
• Mesos
• Kubernetes
• Docker Swarm

Each has its own strengths and weaknesses; each has unique characteristics that may make it suitable, or unsuitable, for Spark. Understanding these differences is critical to the successful deployment of Spark on Docker containers.

This session will describe the work done by the BlueData engineering team to run Spark inside containers, on a distributed platform, including the evaluation of various orchestration frameworks and lessons learned. You will learn how to apply practical networking and storage techniques to achieve high performance and agility in a distributed, container environment.

Presentation Video


Thomas Phelan
Chief Architect
BlueData Inc
Thomas Phelan is cofounder and chief architect of BlueData. Previously, Tom was an early employee at VMware; as senior staff engineer, he was a key member of the ESX storage architecture team. During his 10-year stint at VMware, he designed and developed the ESX storage I/O load-balancing subsystem and modular “pluggable storage architecture.” He went on to lead teams working on many key storage initiatives, such as the cloud storage gateway and vFlash. Earlier, Tom was a member of the original team at Silicon Graphics that designed and implemented XFS, the first commercially available 64-bit file system.