Extending Twitter’s Data Platform to Google Cloud

Extending Twitter's Data Platform to Google Cloud

Wednesday, May 22
2:50 PM - 3:30 PM
Marquis Salon 7

Twitter's Data Platform is built using multiple complex open source and in house projects to support Data Analytics on hundreds of petabytes of data. Our platform support storage, compute, data ingestion, discovery and management and various tools and libraries to help users for both batch and realtime analytics. Our DataPlatform operates on multiple clusters across different data centers to help thousands of users discover valuable insights. As we were scaling our Data Platform to multiple clusters, we also evaluated various cloud vendors to support use cases outside of our data centers. In this talk we share our architecture and how we extend our data platform to use cloud as another datacenter. We walk through our evaluation process, challenges we faced supporting data analytics at Twitter scale on cloud and present our current solution. Extending Twitter's Data platform to cloud was complex task which we deep dive in this presentation.

SPEAKERS

Lohit VijayaRenu
Software Engineer
Twitter Inc
Lohit is part of Hadoop and Log Management team at Twitter. He has been concentrating on scaling Hadoop FileSystem, Hadoop Resource Manager, Log Ingestion and Processing pipelines at Twitter. Previously he has worked at few startups building scalable file systems and was also part of Hadoop team at Yahoo! when it was open sourced. He has Masters degree in Computer Science from Stony Brook University.
Vrushali Channapattan
Hadoop Software Engineer
Twitter Inc
Vrushali Channapattan is an active Apache Hadoop Committer & PMC member who is currently working in the Hadoop team at Twitter focusing on ensuring that Hadoop can keep meeting the rapidly expanding storage and computation needs at Twitter. In past roles, she has also worked with Intuit, Yahoo!, Oracle, Persistent Systems and Tata Institute of Fundamental Research in India.