Running Apache Hadoop on the Google Cloud Platform

Running Apache Hadoop on the Google Cloud Platform

Wednesday, June 20
4:40 PM - 5:30 PM
Grand Ballroom 220A

This talk will cover various aspects of running Apache Hadoop, and ecosystem projects on cloud platforms with a focus on the Google Cloud Platform (GCP). We will compare HDFS with cloud-based object storage services for storing unstructured data. We will look under the hood of the Google Cloud Storage (GCS) Connector to better understand how cloud connectors share the file system interface which allows these cloud connectors to easily connect with Apache Hive, Apache Spark, and various other Hadoop ecosystem components.

These cloud storage connectors are key to freeing Apache Hadoop deployments from data locality restrictions and can enable scale-out and freedom from monolithic clusters. However, cloud object stores are not file systems and this can cause challenges for organizations as they migrate to the cloud. This talk will discuss some alternative deployment architectures for running Apache Hadoop, and ecosystem projects on the cloud, to work better with cloud storage, cloud security, and to take advantage of the agility that moving to the cloud brings.

Presentation Video


Siddharth Seth
Principal Software Engineer
Siddharth Seth works as a Software Engineer at Hortonworks, and has been involved with various Hadoop ecosystem projects for the past 7 years. He currently works on the Hortonworks cloud effort, and in the past has worked on Apache Hive-LLAP, Apache Tez, and Apache Hadoop with a focus on YARN and MapReduce. He is a Hive committer, a member of the Apache Tez PMC, and the Apache Hadoop PMC. Prior to this he spent several years working on search at Yahoo.
Christopher Crosbie
Product Manager, Dataproc and Open Data Analytics (ODA)
Christopher Crosbie has over fifteen years of experience developing and deploying data technology in enterprise environments. He is currently on the Cloud Partner Engineering team at Google where he serves a trusted advisor to software vendors that build Data, Analytics and ML solutions on the Google Cloud platform. Previous to joining Google, Chris was a development manager at Amazon and before that he headed up the data science team at Memorial Sloan Kettering Cancer Center where he implemented the enterprise Hortonworks architecture and strategy. Chris started his career as a biostatistics application engineer at the NSABP, a not-for-profit clinical trials cooperative group supported by the National Cancer Institute. He holds an MPH in Biostatistics and an MS in Information Science.