How to Ingest 16 Billion Records Per Day into your Hadoop Environment

How to Ingest 16 Billion Records Per Day into your Hadoop Environment

Thursday, March 21
11:00 AM - 11:40 AM
Room 124-125

In a modern society, mobile networks has become one of the most important infrastructure components. The availability of a mobile network has become even essential in areas like health care and machine to machine communication.

In 2016, Telefónica Germany begun the Customer Experience Management (CEM) project to get KPI out of the mobile network describing the participant’s experience while using the Telefónica’s mobile network. These KPI help to plan and create a better mobile network where improvements are indicated.

Telefónica is using Hortonworks HDF solution to ingest 16 billion records a day which are generated by CEM. To achieve the best out of HDF abilities some customizations have been made:

1.) Custom processors have been written to comply with data privacy rules.
2.) Nifi is running in Docker containers within a Kubernetes cluster to increase reliability of the ingestion system.

Finally, the data is presented in Hive tables and Kafka topics to be further processed. In this talk, we will present the CEM use case and how it is technically implemented as stated in (1) and (2). Most interesting part for the audience should be our experiences we have made using HDF in a Docker/Kubernetes environment since this solution is not yet officially supported.

Presentation Video


Uwe Weber
Senior Big Data Engineer
Telefonica Germany
Uwe Weber is working since almost 20 years in the IT environment and became a Big Data Engineer at Telefónica in 2014. He initially set up Telefónica’s Hadoop environment and infrastructure and supports business departments to utilize the “new world”.