Mutant Tests Too: The SQL

Mutant Tests Too: The SQL

Thursday, March 21
4:50 PM - 5:30 PM
Room 118-119

The big data platforms of many organisations are underpinned by a technology that is soon to celebrate its 45th birthday: SQL. This industry stalwart is applied in a multitude of critical points in business data flows; the results that these processes generate may significantly influence business and financial decision making. However, the SQL ecosystem has been overlooked and ignored by more recent innovations in the field of software engineering best practices such as fine grained automated testing and code quality metrics. This exposes organisations to poor application maintainability, high bug rates, and ultimately corporate risk.

We present the work we’ve been doing at to address these issues by bringing some advanced software engineering practices and open source tools to the realm of Apache Hive SQL. We first define the relevance of such approaches and demonstrate how automated testing can be applied to Hive SQL using HiveRunner, a JUnit based testing framework. We next consider how best to structure Hive queries to yield meaningful test scenarios that are maintainable and performant. Finally, we demonstrate how test coverage reports can highlight areas of risk in SQL codebases and weaknesses in the testing process. We do this using Mutant Swarm, an open source mutation testing tool for SQL languages developed by that can deliver insights similar to those produced by Java focused tools such as Jacoco and PIT.

Presentation Video


Elliot West
Principal Engineer
Elliot is a principal engineer at in London where he designs tooling and platforms in the big data space. Prior to this Elliot worked in’s data team, developing services for managing large volumes of music metadata.
Jay Green-Stevens
Associate Software Development Engineer
Jay is a final year student at King’s College London studying Computer Science. She joined in the Big Data Platform team for her industrial placement year where she spent time working with Apache Hive, modularization techniques for SQL, and mutation testing tools.