This talk will cover various aspects of running Apache Hadoop, and ecosystem projects on cloud platforms with a focus on the Google Cloud Platform (GCP). We will compare HDFS with cloud-based object storage services for storing unstructured data. We will look under the hood of the Google Cloud Storage (GCS) Connector to better understand how cloud connectors share the file system interface which allows these cloud connectors to easily connect with Apache Hive, Apache Spark, and various other Hadoop ecosystem components.
These cloud storage connectors are key to freeing Apache Hadoop deployments from data locality restrictions and can enable scale-out and freedom from monolithic clusters. However, cloud object stores are not file systems and this can cause challenges for organizations as they migrate to the cloud. This talk will discuss some alternative deployment architectures for running Apache Hadoop, and ecosystem projects on the cloud, to work better with cloud storage, cloud security, and to take advantage of the agility that moving to the cloud brings.