HDFS router-based federation

HDFS router-based federation

Wednesday, June 20
2:50 PM - 3:30 PM
Meeting Room 211A/B/C/D

HDFS deployments are growing in size, but their scalability is limited by the NameNodes (metadata overhead, file blocks, the number of Datanode heartbeats, and the increasing HDFS RPC workload). A common solution is to split the filesystem into multiple smaller subclusters. A challenge with this approach is how to maintain the splits of the subclusters (e.g., namespace partition), avoid forcing users to connect to multiple subclusters, and manage the allocation of directories/files themselves.

To solve this limitation, we have developed HDFS router-based federation, which horizontally scales out HDFS by building a federation layer for subclusters. It provides a federated view of multiple HDFS namespaces, and offers the same RPC and WebHDFS endpoints as Namenodes.

This approach is similar to existing ViewFS and HDFS federation functionality, except the mount table is managed on the service side by the routing layer rather than on client sides. This simplifies access to a federated cluster for existing HDFS clients and also clears the way for innovations inside HDFS subclusters (e.g., moving data among different subclusters for tiered purpose). This design follows the same design as YARN federation which provides near-linear scale-out by simply adding more subclusters.

SPEAKERS

Chao Sun
Software Engineer
Uber
Chao Sun is a Software Engineer at Uber, working on Hadoop Infrastructure, including stacks such as Hive and HDFS. Before that, he was a Software Engineer at Cloudera, worked on various projects including Hive on Spark and RecordService.
Inigo Goiri
Research Software Development Engineer
Microsoft
Inigo is working as a research software developer at Microsoft Research in the System Research Group currently focusing on HDFS, specifically scaling it to 100K+ nodes and making it able to harvest idle resources. He has been working on the Hadoop ecosystem since 2011 and is a committer. Prior to Microsoft, he worked at Rutgers University as a postdoctoral researcher.