Scaling Hadoop at LinkedIn

Tuesday, June 19
2:00 PM - 2:40 PM
Grand Ballroom 220A

LinkedIn leverages the Apache Hadoop ecosystem for its big data analytics. Steady growth of the member base at LinkedIn along with their social activities results in exponential growth of the analytics infrastructure. Innovations in analytics tooling lead to heavier workloads on the clusters, which generate more data, which in turn encourage innovations in tooling and more workloads. Thus, the infrastructure remains under constant growth pressure. Heterogeneous environments embodied via a variety of hardware and diverse workloads make the task even more challenging.

This talk will tell the story of how we doubled our Hadoop infrastructure twice in the past two years.
• We will outline our main use cases and historical rates of cluster growth in multiple dimensions.
• We will focus on optimizations, configuration improvements, performance monitoring and architectural decisions we undertook to allow the infrastructure to keep pace with business needs.
• The topics include improvements in HDFS NameNode performance, and fine tuning of block report processing, the block balancer, and the namespace checkpointer.
• We will reveal a study on the optimal storage device for HDFS persistent journals (SATA vs. SAS vs. SSD vs. RAID).
• We will also describe Satellite Cluster project which allowed us to double the objects stored on one logical cluster by splitting an HDFS cluster into two partitions without the use of federation and practically no code changes.
• Finally, we will take a peek at our future goals, requirements, and growth perspectives.

Presentation Video


Konstantin Shvachko
Sr Staff Software Engineer
Konstantin V. Shvachko is an expert in Big Data technologies, file systems, and storage solutions. He specializes in efficient data structures and algo­rithms for large-scale distributed storage systems. Konstantin is known as an open source software developer, author, inventor, and entrepreneur. He is a senior staff software engineer at LinkedIn.
Erik Krogen
Senior Software Engineer
Erik is a software engineer with a passion for all things distributed systems. He currently focuses on Big Data storage and analytics at LinkedIn. His work mainly focuses around the scalability of HDFS, both internally and via contributions to open source. Erik is particularly excited by investigating research into new and interesting storage technologies, and is passionate about the promotion of female involvement and empowerment in the technology space.