Containers and Big Data

Containers and Big Data

Wednesday, June 20
2:00 PM - 2:40 PM
Grand Ballroom 220A

As containerization continues to gain momentum and become a de facto standard for application deployment, challenges around containerization of big data workloads are coming to light. Great strides have been made within the open source communities towards running big data workloads in containers, but much is left to be done.

Apache Hadoop YARN is the modern distributed operating system for big data applications. It has morphed the Hadoop compute layer into a common resource-management platform that can host a wide variety of applications. At its core, YARN has a very powerful scheduler which enforces global cluster level invariants and helps sites manage user and operator expectations of elastic sharing, resource usage limits, SLAs, and more. YARN recently increased its support for Docker containerization and added a YARN service framework supporting long-running services.

In this session we will explore the emerging patterns and challenges related to containers and big data workloads, including running applications such as Apache Spark, Apache HBase, and Kubernetes in containers on YARN.

Presentation Video


Billie Rinaldi
Principal Software Engineer I
Billie Rinaldi is a Principal Software Engineer I at Hortonworks, currently prototyping new features related to long-running services and containers in Apache Hadoop YARN. Prior to August 2012, Billie engaged in big data science and research at the National Security Agency, where she provided early leadership for Apache Accumulo. Billie is a member of the Apache Software Foundation and a committer for Apache Hadoop and a number of other Apache projects in the Hadoop ecosystem. She holds a Ph.D. in applied mathematics from Rensselaer Polytechnic Institute.
Shane Kumpf
Software Engineer
Shane Kumpf is a Staff Software Engineer on the Apache Hadoop YARN R&D team at Hortonworks focusing on containerization.