Free Servers to Build Big Data System on: Bing’s Approach

Free Servers to Build Big Data System on: Bing’s Approach

Wednesday, May 22
2:00 PM - 2:40 PM
Marquis Salon 12

A couple of thousands of servers for a big data system is also a big investment. Microsoft Bing has figured out a way to fulfill our needs without signing a huge check. We have the technology to harvest spare cycles on underutilized servers. And we tweak the configurations in Hadoop and Spark to fit the flexible capacity base. We have saved hundreds of millions of dollars per year.

Bing is adopting open source big data technologies for our offline data processing system. It requires a massive amount of capacity, which implies a significant bill. With collaboration with Windows and Azure, Research teams, we can harvest most of the needed capacity from our existing server fleet. We make use of the capacity on reserve servers while keeping them instantly available for emergency use; we allocate compute and storage to servers when they are not fully occupied. We updated Hadoop node decommission, HDFS block placement, YARN node label mapping, and a few other policies so that they can adapt to the capacity that is even less reliable than commodity servers. We brought open source capacity to Bing product with less than 1 percent of the cost we had done it through normal approach. We also extend the YARN and Spark framework to better fit the need of deep learning training and inferencing workloads in our system. This extension is equipping Bing with direct questioning and answering type of interactive query features.

Big does not mean expensive. The audience can learn about the approach from Bing that they can make better use of their existing servers to do additional big data systems.


Kai Liu
Senior Program Manager
Kai Liu is a Senior Program Manager in AI and Research group of Microsoft. He has 8 years of experience in data driven engineering, big data platform and AI infrastructure for Office and Bing product families. He led his team to create a service health portal for SharePoint Online, inject a distributed log collection and storage system for Exchange Online, publish curated data sets, key business metrics, and enable sub-hour experimentations in Office 365. Currently he is working on the next generation of Big Data and Deep Learning platform for Bing based on Open Source technologies.
Jack Zhang