Scalable HiveServer2 as a Service

Thursday, May 23
2:00 PM - 2:40 PM
Marquis Salon 10

HiveServer2 provides a multi-tenant service end-point for executing Hive queries concurrently. It provides support for authentication and authorization, serves as a JDBC endpoint for users to connect and run queries via various tools, maintains sessions and warm containers for faster query processing, provides caching at multiple levels and much more. In other words, it is an integral component of any Hive deployment. HiveServer2 deployments however often face performance and reliability issues leading to catastrophic failures at times. At Qubole, we have augmented HiveServer2 to utilize the capabilities of the cloud to offer an enterprise-ready scalable and stable HiveServer2 (or HS2) service.

The HS2 experience on the cloud at Qubole, which is our primary platform of deployment, has been enhanced to automatically scale based on the customer’s workload; our solution adds and gracefully removes HS2 instances according to the requirement, thus making HS2 service not only self-sufficient at scale but also fault-tolerant. We have implemented Load Balancing for queries based on the resource utilization on HS2 instances to provide a reliable, efficient and cost-effective solution. A health monitoring service, based on past learnings and insights of running HS2 in customer deployments, implemented on top of this scalable HS2 service acts as the foundation for battle-tested, enterprise-ready solution for HS2 instances. In this talk, we will share the details of such an implementation, and the challenges faced in providing an auto-scalable, highly performant and reliable HS2 experience in the cloud.

Topics include:

* Workload-aware autoscaling for HS2 clusters.

* Agent-based adaptive load balancing of Hive queries on multi-tenant HS2 clusters.

* Durability monitoring using failure semantics and automated measures to provide reliability.

* Enterprise level security for HS2 on the cloud.

* Metrics, monitoring and alerting around the HS2 service.

SPEAKERS

Nitin Khandelwal
Staff Engineer
Qubole Inc
Nitin Khandelwal is working at Qubole as a Staff Engineer. He has worked in a different arena of projects like adding encrypted communication for ephemeral clusters nodes running in the cloud, providing Hive as a multi-tenant service, Autoscaling, etc. He has been contributing significantly in optimizing Tez engine for ETL workloads by adding features like workload-aware autoscaling, fault-tolerance, effective use of spot nodes, etc. Previously, Nitin was working with Microsoft on VPN Site-to-site gateway service which forms the backbone of Microsoft Azure Stack's network. Nitin has completed his Masters in Computer Science from IIIT-Hyderabad. His main areas of focus there were distributed computing, databases and networks.
shreya bhatia
MTS
QUBOLE Inc.
Shreya Bhatia is working in Qubole as a Member of Technical Staff. She works there on Hive Stack, and has been part of projects like providing Hive as a service on a cloud agnostic platform, building Metrics and alerting solution for HiveServer2 and stabilizing it under a highly concurrent load, performance analysis of MapReduce on Yarn in the Qubole Stack etc. She completed here Masters in Computer Science from Stony Brook University, New York in 2016. Previously she was working in India with InfoEdge (Naukri.com) as part of Search Team and worked on building extraction systems like Resume/Email parser, Job Crawler etc.