Building Audi’s enterprise big data platform

Building Audi’s enterprise big data platform

Wednesday, April 18
2:00 PM - 2:40 PM

This talk is about building Audi's big data platform from a first Hadoop PoC to a multi-tenant enterprise platform. Why a big data platform at all? We explain the requirements that drove the development of this platform and explain the decisions we had to make during this journey.

During the process of setting up our big data infrastructure, we often had to find the right balance between going for enterprise integration versus speed. For instance, whether to use the existing Active Directory for both LDAP and KDC versus setting up our own KDC. Using a shared enterprise service like Active Directory requires to follow certain naming conventions and restricted access, where running our own KDC brings much more flexibility but also adds another component to maintain to our platform. We show the advantages and disadvantages and explain why we've decided to choose a certain approach.

For data ingestion of both batch and streaming data, we use Apache Kafka. We explain why we installed a separate Kafka cluster from our Hadoop platform. We discuss the pros and cons of using the Kafka binary protocol and the HTTP REST protocol not only from a technical perspective but also from the organisational perspective as the source systems are required to push data into Kafka.

We give an overview of our current architecture including how some use cases are implemented on it. Some of them run exclusively on our new big data stack, while others use it in conjunction with our data warehouse. The use cases cover all different kinds of data from sensory data of robots in our plants to click streams from web applications.

Building an enterprise platform does not only consist of technical tasks but also of organizational tasks: data ownership, authorization to access certain data sets, or more financial ones like internal pricing and SLAs.

Although we have already achieved quite a lot, our journey has not yet ended. There are still some open topics to address, like providing a unified logging solution for applications spanning multiple platforms. Or finally offering a notebook-like Zeppelin to our analysts. Or addressing legal issues like GDPR.

We will conclude our talk with a short glimpse into our ongoing extension of our on-premises platform into a hybrid cloud platform.

Presentation Video


Carsten Herbe
Big Data Architect
Audi Business Innovation GmbH
Carsten works as a Big Data Architect at Audi Business Innovation GmbH. Audi Business Innovation GmbH, a subsidiary of Audi, is a small company focused on developping new mobility services as well as innovative IT solutions for Audi. Carsten has more than 10 year experience in delivering Data Warehouse and BI solutions to his customers. He started working with Hadoop in 2013 and since then he has focused on both big data infrastructure and solutions. Currently Carsten is helping Audi to extend their Big Data platform based on Hadoop and Kafka to the cloud. Further, as an solution architect he is responsible for developing and running analytical applications on that platform.
Matthias Graunitz
Big Data Architect
Matthias Graunitz is a big data architect at Audi, where he works at the company’s Competence Center for Big Data and Business Intelligence, where he is responsible for the architectural framework of the Hadoop ecosystem, a separate Kafka Cluster as well as for the data science tool kits provided by the Center of Competence for all business departments at Audi. Matthias has more than 10 years’ experience in the field of business intelligence and big data.