Herding Elephants: Seamless Data Access in Multi-Cluster Clouds

Herding Elephants: Seamless Data Access in Multi-Cluster Clouds

Wednesday, May 22
11:00 AM - 11:40 AM
Marquis Salon 8

Expedia Group is in the process of migrating its Hadoop infrastructure from a single organization-wide on-premise cluster to large numbers of smaller in-cloud clusters. We've also moved from a centralized operating model, where one team was responsible for our Hadoop platform, to a distributed approach where infrastructure is owned and operated by our different brands: Hotels.com, Expedia.com, HomeAway.com, etc. This segmentation of our data platforms has allowed us to realize greater agility, resource elasticity, and reduced costs. However, it has generated architectural fragmentation, creating cloud-based data silos that impeded our ability to explore, discover, and share data across our organization. We describe these technical challenges and the solutions we've developed to provide our users with a virtual, unified view of our many data lakes. We'll present Apiary, an open source project that we developed that provides a standardized pattern for deploying and operating data lakes that support:

- federated data set sharing across accounts, regions, and clouds
- a "Bring Your Own Tool" culture, supporting a broad range of data processing platforms in the Hadoop ecosystem
- replication of data sets for disaster recovery
- data access security

Presentation Video


Pradeep Bhadani
Senior Big Data Engineer
Pradeep is a Senior Big Data Engineer at Hotels.com in London where he builds and manages cloud infrastructure and core services like Apiary. Pradeep has worked in the big data space for the last 7 years, building large scale platforms.
Elliot West
Principal Engineer
Elliot is a principal engineer at Hotels.com in London where he designs tooling and platforms in the big data space. Prior to this Elliot worked in Last.fm’s data team, developing services for managing large volumes of music metadata.