Storage Requirements and Options for Running Spark on Kubernetes

Storage Requirements and Options for Running Spark on Kubernetes

Thursday, March 21
11:00 AM - 11:40 AM
Room 122-123

In a world of serverless computing users tend to be frugal when it comes to expenditure on compute, storage and other resources. Paying for the same when they aren’t in use becomes a significant factor. Offering Spark as service on cloud presents very unique challenges. Running Spark on Kubernetes presents a lot of challenges especially around storage and persistence. Spark workloads have very unique requirements of Storage for intermediate data, long time persistence, Share file system and requirements become very tight when it same need to be offered as a service for enterprise to mange GDPR and other compliance like ISO 27001 and HIPAA certifications.

This talk covers challenges involved in providing Serverless Spark Clusters share the specific issues one can encounter when running large Kubernetes clusters in production especially covering the scenarios related to persistence.

This talk will help people using Kubernetes or docker runtime in production and help them understand various storage options available and which is more suitable for running Spark workloads on Kubernetes and what more can be done

Presentation Video


Rachit Arora
Rachit Arora is a Senior Developer at IBM,India Software Labs. He is key designer of the IBM's offerings on Cloud for Hadoop ecosystem . He has extensive experience in architecture, design and agile developmemt. Rachit is an expert in application development in Cloud architecture and development using hadoop and it's ecosystem. He has been active speaker for BigData technologies in various conference like Information Management Technical Conference-2015 , ContainerCon NA-2016, Container Camp Sydeny 2017 etc.