Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN

Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN

Wednesday, March 20
11:00 AM - 11:40 AM
Room 124-125

Deep learning is useful for enterprises tasks in the field of speech recognition, image classification, AI chatbots and machine translation, just to name a few.

In order to train deep learning/machine learning models, applications such as TensorFlow / MXNet / Caffe / XGBoost can be leveraged. And sometimes these applications will be used together to solve different problems.

To make distributed deep learning/machine learning applications easily launched, managed, monitored. Hadoop community has introduced Submarine project along with other improvements such as first-class GPU support, container-DNS support, scheduling improvements, etc. These improvements make distributed deep learning/machine learning applications run on YARN as simple as running it locally, which can let machine-learning engineers focus on algorithms instead of worrying about underlying infrastructure. Also, YARN can better manage a shared cluster which runs deep learning/machine learning and other services/ETL jobs with these improvements.

In this session, we will take a closer look at Submarine project as well as other improvements and show how to run these deep learning workloads on YARN with demos. Audiences can start trying running these workloads on YARN after this talk.

Presentation Video

SPEAKERS

Sunil Govindan
Staff Engineer
Cloudera
Sunil Govindan is contributing to Apache Hadoop project since 2013 in various roles as Hadoop Contributor, Hadoop Committer and member Project Management Committee (PMC). He is working as Staff Software Engineer at Hortonworks in YARN team. He is majorly contributing in YARN Scheduling improvements such as Intra-Queue Resource preemption, Multiple Resource types support in YARN with Resource Profiles, Absolute Resource configuration support in Queues etc. He also drove efforts to improve YARN UI for better user experience with community. Before Hortonworks, he worked at Juniper on a custom resource scheduler. Prior to that, he was associated with Huawei and worked on Platform and Middleware distributed systems including Hadoop platform. He loves reading books, an ardent music lover and passionate about go-green efforts.
Zhankun Tang
Staff Engineer
Hortonworks
Zhankun Tang is a code monkey who’s interested in big data, cloud and operating system. He is doing customer resource plugin, GPU topology support now. Prior to Hortonworks, he works in Intel for 7 years after he got his master degree. In the recent past years, he leads a small group focusing on Intel cutting-edge technology enabling in Hadoop and performance optimization in Apache Spark. He was also doing customer engagement as well as path-finding in open source community. And he’s a participant of Apache Mesos, Kubernetes and Tensorflow community.