Apache Hadoop: State of the Union

Apache Hadoop: State of the Union

Tuesday, October 16
1:00 PM - 1:45 PM
Room 401

Apache Hadoop YARN is the modern distributed operating system for big data applications and HDFS for storage. It morphed the Hadoop compute layer to be a common resource management platform that can host a wide variety of applications.

In this talk, we’ll start with the current status of Apache Hadoop YARN—how it is used today in deployments large and small. We'll then move on to the exciting present and future of YARN—features that are further strengthening YARN as the first class resource management platform for data centers running enterprise Hadoop.
We’ll discuss the current status as well as the future promise of features and initiatives like: powerful container placement, global scheduling, support for machine learning and deep learning workloads through GPU and FPGA support, extreme scale with YARN federation, containerized apps on YARN, support for long-running services (alongside applications) natively without any changes, seamless application upgrades, powerful scheduling features, operational enhancements and better queue management.

The second part of this talk will focus on the latest enhancements to HDFS. HDFS has several strengths: horizontally scale its IO bandwidth and scale its storage to petabytes of storage. Further, it provides very low latency metadata operations and scales to over 60K concurrent clients. Hadoop 3.0 recently added Erasure Coding. One of HDFS’s limitations is scaling a number of files and blocks in the system. We describe a radical change to Hadoop’s storage infrastructure with the upcoming Ozone technology. It allows Hadoop to scale to tens of billions of files and blocks and, in the future, to every larger number of smaller objects.


Sanjay Radia
Chief Architect, Founder
Sanjay is founder and chief architect at Hortonworks, and an Apache Hadoop committer and member of the Apache Hadoop PMC. Prior to co-founding Hortonworks, Sanjay was the chief architect of core-Hadoop at Yahoo and part of the team that created Hadoop. In Hadoop he has contributed to several areas including HDFS, MapReduce schedulers, Yarn's design, high availability, compatibility, etc. He has also held senior engineering positions at Sun Microsystems and INRIA, where he developed software for distributed systems and grid/utility computing infrastructures. Sanjay has a PhD in Computer Science from the University of Waterloo in Canada.