Highly configurable and extensible data processing framework at PubMatic

Highly configurable and extensible data processing framework at PubMatic

Thursday, June 21
9:30 AM - 10:10 AM
Grand Ballroom 220C

PubMatic is a leading advertisement technology company that processes 500 billion transactions (50 terabytes of data) per day in real-time and batch processing pipeline on a 900-node cluster to power highly efficient machine learning algorithms, provide real time feedback to ad-server for optimization and provide in depth insights on customer inventory and audience.

At PubMatic, scaling with ever growing volume has always been the biggest challenge; we have been optimizing our technology stack for performance and costs. Another challenge is to support the demand for variety reports and analytics by customers and internal stakeholders. Writing custom jobs to provide analytics leads to repetitive efforts and redundancy of business logic in many different jobs.

To solve the above problems, we built a platform that allows creating configuration driven data processing pipeline with high re-usability of business functions. It is also extensible to utilize cutting-edge technologies in the ever-changing big data ecosystem. This platform enables our development teams to build a robust batch data processing pipeline to power analytics dashboards. It also empowers novice users to provide a configuration with fact and dimensions to generate ad-hoc reports in a single data processing job. Framework intelligently identifies and re-uses existing business functions based on user inputs. It also provides an abstraction layer that keeps core business logic un-affected by the any technology changes. This framework is currently powered by Spark, but it can be easily configured with other technologies.

Framework significantly improved time to develop data processing jobs from weeks to few days, it simplified unit testing and QA automation, as well as provided simpler interfaces to the customers and internal stakeholders to generate custom reports.

Presentation Video


Kunal Umrigar
Sr. Director, Engineering, Big Data & Analytics
Kunal Umrigar, Sr. Director, Engineering at PubMatic, leads the big data & analytics team to design and develop a big data platform to ingest and process terabytes of data. Along with his big data experience, he is also has been a lead architect of PubMatic Platform APIs and has designed and developed the Microservice and API infrastructure from the ground up. Prior to PubMatic, Kunal held engineering roles at Unica (now part of IBM) where he worked on designing and developing enterprise applications.