Securing Data in Hadoop at Uber

Securing Data in Hadoop at Uber

Wednesday, June 20
4:00 PM - 4:40 PM
Grand Ballroom 220B

When it comes to data security, Uber’s business has unique needs related to scale, use-case, and technical stacks. This talk will discuss how our data platform team addressed specific challenges in deploying Uber's security requirements for Apache Hadoop, including how we leveraged open source building blocks. We'll share insights on how we augmented our Kerberized Hadoop integration with additional authentications mechanisms as well as our approach to supporting custom authentication in Apache Knox. In particular, we will elaborate Uber’s contributions to Apache Knox, specifically a novel pluggable platform for custom validation of any user request. This talk will also cover how we address table, column, and partition-level access control while ensuring improved developer productivity. In particular, we will explain how we translate RBAC policy into HDFS ACL to control data access, our internal audit platform built to detect and analyze the common security infringements, and real-world examples from our experiences in production.

Presentation Video


Mohammad Islam
Staff Software Engineeer
Uber Inc
Mohammad Kamrul Islam is currently working at Uber on its Data Infrastructure team as a Staff Software Engineer. Previously, he worked at LinkedIn for more than two years as a Staff Software Engineer in their Hadoop Development team. Before that, he worked at Yahoo! for nearly five years as an Oozie architect/technical lead. He has been intimately involved with the Apache Hadoop ecosystem since 2009. Mohammad has a Ph.D. in computer science with a specialization in parallel job scheduling from Ohio State University. He is a Project Management Committee (PMC) member of both Apache Oozie and Apache TEZ and frequently contributes to Apache HDFS/YARN/MapReduce and Apache Hive.
Wei Han
Wei Han is currently working at Uber managing Hadoop security team. Before that, he worked on Cherami, Uber's message queue system. Before Uber, he worked at Microsoft Bing for 7 years, mainly focusing on Bing's indexing generation system and large scale storage systems.