Docker data science pipeline

Docker data science pipeline

Tuesday, June 19
4:50 PM - 5:30 PM
Executive Ballroom 210D/H

At ING we needed a way to implement Data science models from exploration into production. I will do this talk from my experience on the exploration and production Hadoop environment as a senior Ops engineer. For this we are using OpenShift to run Docker containers that connect to the big data Hadoop environment.

During this talk I will explain why we need this and how this is done at ING. Also how to set up a docker container running a data science model using Hive, Python, and Spark. I’ll explain how to use Docker files to build Docker images, add all the needed components inside the Docker image, and how to run different versions of software in different containers.

In the end I will also give a demo of how it runs and is automated using Git with webhook connecting to Jenkins and start the docker service that will connect to a big data Hadoop environment.

This is going to be a great technical talk for engineers and data scientist.


Lennard Cornelis
Ops Engineer
I have a great passion for technology. Always in to learn new skills and loving the way how fast things are changing in this industry. I am technical person and also like to connect with other passionate people in the technology sector. Knowledge sharing is very important to me and love the role of mentoring colleagues. My latest challenge is to know everything about Docker . I am really a hands-on person and love to solve difficult and challenging problems. There is no greater joy then getting things done and have a good working system. Always there to go the extra mile to deliver a finished and proper implemented project.