HBase coprocessors, Uses, Abuses, Solutions

HBase coprocessors, Uses, Abuses, Solutions

Wednesday, June 20
2:00 PM - 2:40 PM
Meeting Room 230C

One key feature that differentiates HBase from other distributed databases is its support of coprocessors. Bloomberg develops and manages some very low-latency systems that service real-time requests. In order to achieve real-time speeds, it was necessary to utilize coprocessors, which are similar to traditional stored procedures. As a result, we were able to match the average latency of an HBase cluster with that of a traditional database. This was done by using coprocessors to parallelize a lot of data computation and reduce the number of round-trips to the cluster by a factor of 5, thereby lowering the amount of data sent over the wire by 5. However, there are also significant challenges to managing coprocessors in a production environment. In this talk, I will to review the use case for HBase coprocessors and some practical tips on how to properly develop and deploy them. Some of the key topics covered in this talk are:
Type of coprocessors
Development challenges
Deployment challenges

SPEAKERS

Amit Anand
Senior Software Developer
Bloomberg LP
Amit Anand is a senior software developer at Bloomberg on the Hadoop Servics/Infrastructure team, where he is involved in designing and developing tools, that are used by users, around hadoop platform. Amit is involved on the Hadoop Infrastucture team as well, where he is responsible for deployment and management of hadoop clusters. He focuses on HDFS, Yarn, HBase and Spark. He holds a Bachelors in Commerce and a Masters in Computer Science.
Esther Kundin
Senior Software Engineer
Bloomberg LP
Esther Kundin is a senior software developer at Bloomberg on the Machine Learning Text Analysis team, where she is the lead architect and engineer of the data archival project for the Engineering News department. She focuses on HDFS ingestion pipelines and PySpark integration for news data and analytics. Previously, Esther has worked on the Hadoop Infrastructure team as well as the Portfolio Analytics team in Bloomberg. She holds a BA in Computer Science and Mathematics from New York University and a Masters in Computer Science from Columbia University.