DataWorks Summit in Berlin, Germany

April 16–19, 2018

The Industry’s Premier Big Data Community Event

Leading enterprises are using advanced analytics, data science, and artificial intelligence to transform the way they deliver customer and product experiences—at scale. Discover how they’re doing it. Learn about the latest developments. And network with peers and pioneers to learn how to apply open source technology to accelerate your digital transformation.

Tracks

DataWorks Summit Berlin 2018 features two days of content dedicated to enabling next-generation data solutions. You’ll hear industry experts, architects, data scientists, and open source Apache developers and committers share success stories, best practices, and cautionary tales that provide practical guidance to novices as well as experienced practitioners.

Technical sessions explore technologies, applications, and use cases to help you understand the what’s available, how to apply, and what others are achieving. These sessions range in technical depth from introductory to advanced.

Business sessions connect technology to business needs and outcomes. Sessions include case studies, executive briefings, and tutorials that detail best practices to becoming a data-driven organization. Speakers will explain how their businesses are transforming and they’ll discuss the roadblocks and organizational challenges they faced.

In addition to sit-and-listen sessions, Crash Courses provide a hands-on introduction to key Apache projects. They start with a technical introduction, then you’ll explore—under the guidance of an expert instructor—on your own machine. You’ll walk away with a working environment to continue your journey See Crash Courses available.

Birds-of-a-Feather sessions (BoFs)—hosted by Apache committers, architects, and engineers—provide a forum for you to connect and share. There’s no agenda; the group goes where issues and interests take them. Come share your experiences and challenges on Apache projects in these BoFs.

Tracks are divided into eight key topic areas, which will cover:

Data Warehousing and Operational Data Stores

Data Warehousing and Operational Data Stores

Apache Hadoop YARN has transformed Hadoop into a multi-tenant data platform that enables the interaction of legacy data stores and big data. It is the foundation for multiple processing engines that let applications interact with data in the most appropriate way from batch to interactive SQL to low latency access with NoSQL.

Sessions will cover the vast ecosystem of SQL engines and tools that enable richer enterprise data warehousing (EDW) on Hadoop. You’ll learn how NoSQL stores like Apache HBase are adding transactional capability that brings traditional operational data store (ODS) workloads to Hadoop and why data preparation is a key workload. You’ll meet Apache community rock stars and learn how these innovators are building the applications of the future.

Sample technologies:
Apache Hive, Apache Tez, Apache ORC, Druid, Apache Parquet, Apache HBase, Apache Phoenix, Apache Accumulo, Apache Drill, Presto, Apache Pig, JanusGraph, Apache Impala

Artificial Intelligence and Data Science

Artificial Intelligence and Data Science

Artificial Intelligence (AI) is transforming every industry. Data science and machine learning are opening new doors in process automation, predictive analytics, and decision optimization. This track offers sessions spanning the entire data science lifecycle: development, test, and production.

You’ll see examples of innovative analytics applications and systems for data visualization, statistics, machine learning, cognitive systems, and deep learning. We’ll show you how to use modern open source workbenches to develop, test, and evaluate advanced AI models before deploying them. You’ll hear from leading researchers, data scientists, analysts, and practitioners who are driving innovation in AI and data science.
Sample technologies:

Apache Spark, R, Apache Livy, Apache Zeppelin, Jupyter, scikit-learn, Keras, TensorFlow, DeepLearning4J, Chainer, Lasagne/Blocks/Theano, CaffeOnSpark, Apache MXNet, and PyTorch/Torch

Big Compute and Storage

Big Compute and Storage

Apache Hadoop continues to drive data management innovation at a rapid pace. Hadoop 3.0 adds container management to YARN, an object store to HDFS, and more. This track presents these advances and describes projects in incubation and the industry initiatives driving innovation in and around the Hadoop platform.

You’ll learn about key projects like HDFS, YARN, and related technologies. You’ll interact with technical leads, committers, and experts who are driving the roadmaps, key features, and advanced technology research around what is coming next and the extended open source big compute and storage ecosystem.

Sample technologies:
Apache Hadoop (YARN, HDFS, Ozone), Apache Kudu, Kubernetes, Apache BookKeeper

Cloud and Operations

Cloud and Operations

For a system to be “open for business,” system administrators must be able to efficiently manage and operate it. That requires a comprehensive dataflow and operations strategy. This track provides best practices for deploying and operating data lakes, streaming systems, and the extended Apache data ecosystem on premises and in the cloud. Sessions cover the full deployment lifecycle including installation, configuration, initial production deployment, upgrading, patching, loading, moving, backup, and recovery.

You’ll discover how to get started and how to operate your cluster. Speakers will show how to set up and manage high-availability configurations and how DevOps practices can help speed solutions into production. They’ll explain how to manage data across the edge, the data center, and the cloud. And they’ll offer cutting-edge best practices for large-scale deployments.

Sample technologies:
Apache Ambari, Cloudbreak, HDInsight, HDCloud, Data Plane Service, AWS, Azure, and Apache Oozie

For a system to be “open for business,” system administrators must be able to efficiently manage and operate it. That requires a comprehensive dataflow and operations strategy. This track provides best practices for deploying and operating data lakes, streaming systems, and the extended Apache data ecosystem on premises and in the cloud. Sessions cover the full deployment lifecycle including installation, configuration, initial production deployment, upgrading, patching, loading, moving, backup, and recovery.

You’ll discover how to get started and how to operate your cluster. Speakers will show how to set up and manage high-availability configurations and how DevOps practices can help speed solutions into production. They’ll explain how to manage data across the edge, the data center, and the cloud. And they’ll offer cutting-edge best practices for large-scale deployments.

Sample technologies:
Apache Ambari, Cloudbreak, HDInsight, HDCloud, Data Plane Service, AWS, Azure, and Apache Oozie

Governance and Security

Governance and Security

Your data lake contains a growing volume of diverse enterprise data, so a breach could be catastrophic. Privacy violations and regulatory infractions can damage your corporate image and long-term shareholder value. Government and industry regulations demand you properly secure and govern your data to assure compliance and mitigate risks. But as Hadoop and streaming applications emerge as a critical foundation of a modern data architecture, enterprises face new requirements for protection and governance.

In this track, you’ll learn about the key enterprise requirements for governance and security of the extended data plane. You’ll hear best practices, tips, tricks, and war stories on how to secure and govern your big data infrastructure.

Sample technologies:
Apache Ranger, Apache Sentry, Apache Atlas, and Apache Knox

Cyber Security

Cyber Security

The speed and scale of recent ransomware attacks and cyber security breaches have taught us that threat detection and mitigation are the key to security operations in data-driven businesses. Creating cyber security machine learning models and deploying these models in streaming systems is becoming critical to defending and managing these growing threats.

In this track, you’ll learn how to leverage big data and stream processing to improve your cyber security. Experts will explain how to scale with analytics on more data and react in real time.

Sample technologies:
Apache Metron, Apache Spot

IoT and Streaming

IoT and Streaming

The rapid proliferation of sensors and connected devices is fueling an explosion in data. Streaming data allows algorithms to dynamically adapt to new patterns in data, which is critical in applications like fraud detection and stock price prediction. Deploying real-time machine learning models in data streams enables insights and interactions not previously possible.

In this track you’ll learn how to apply machine learning to capture perishable insights from streaming data sources and how to manage devices at the “jagged edge.” Sessions present new strategies and best practices for data ingestion and analysis. Presenters will show how to use these technologies to develop IoT solutions and how to combine historical with streaming data to build dynamic, evolving, real-time predictive systems for actionable insights.

Sample technologies:
Apache Nifi, Apache Storm, Streaming Analytics Manager, Apache Flink, Apache Spark Streaming, Apache Beam, Apache Pulsar and Apache Kafka

Enterprise Adoption

Enterprise Adoption

Enterprise business leaders and innovators are using data to transform their businesses. These modern data applications are augmenting traditional architectures and extending the reach for insights from the edge to the data center. Sessions in this track will discuss business justification and ROI for modern data architectures.

You’ll hear from ISVs and architects who have created applications, frameworks, and solutions that leverage data as an asset to solve real business problems. Speakers from companies and organizations across industries and geographies will describe their data architectures, the business benefits they’ve experienced, their challenges, secrets to their successes, use cases, and the hard-fought lessons learned in their journeys.

Agenda

To filter the agenda to your interests please click on at least one category and choose a value from the drop-down.
Type
Track
Target Audience
Technology
Agenda At A Glance
8:30 AM - 5:00 PM
Pre Event Training
8:30 AM - 5:00 PM
Pre Event Training
6:00 PM - 8:00 PM
Meetups
9:00 AM - 10:30 AM
Opening Keynote
10:30 AM – 2:00 PM
Community Showcase
11:10 AM – 5:00 PM
Track Sessions and Crash Courses
5:30 PM – 6:30 PM
Birds of a Feather
6:00 PM – 8:00 PM
Sponsor Reception
9:00 AM - 10:30 AM
Opening Keynote
10:30 AM – 2:00 PM
Community Showcase
11:10 AM – 5:50 PM
Track Sessions and Crash Courses
 
7:30 AM
 
 
Breakfast
7:30 am - 8:30 am
 
 
 
Registration
7:30 am - 5:00 pm
8:30 AM
 
 
Training
8:30 am - 5:00 pm
10:00 AM
 
 
Morning Break
10:00 am - 10:30 am
12:00 PM
 
 
Lunch
12:00 pm - 1:00 pm
2:30 PM
 
 
Afternoon Break
2:30 pm - 3:00 pm
7:30 AM
 
 
Breakfast
7:30 am - 8:30 am
 
 
 
Registration
7:30 am - 5:00 pm
10:00 AM
 
 
Morning Break
10:00 am - 10:30 am
12:00 PM
 
 
Lunch
12:00 pm - 1:00 pm
2:30 PM
 
 
Afternoon Break
2:30 pm - 3:00 pm
6:00 PM
 
 
Meetups
6:00 pm - 8:00 pm
7:30 AM
 
 
Registration
7:30 am - 6:00 pm
8:00 AM
 
 
Breakfast
8:00 am - 9:00 am
9:00 AM
 
 
Opening Keynote
9:00 am - 10:30 am
10:30 AM
 
 
Community Showcase
10:30 am - 2:00 pm
 
 
 
Morning Break
10:30 am - 11:00 am
11:00 AM
BREAKOUT
 
Technical
Apache Hadoop YARN - State of the Union
11:00 am - 11:40 am
Vinod Kumar Vavilapalli, Hortonworks Inc
Europe
 
Big Compute and Storage
Business
Open Source is just about the source code - isn't it?
11:00 am - 11:40 am
Isabel Drost-Fromm, Europace AG
Room I
 
Big Compute and Storage
Technical
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
11:00 am - 11:40 am
Yanbo Liang, hortonworks
Convention Hall I - C
 
Artificial Intelligence and Data Science
Business
Analyst Panel Unravels the Data Industry
11:00 am - 11:40 am
Scott Gnau, Hortonworks
Room II
 
Enterprise Adoption
Technical
Accelerating query processing with materialized views in Apache Hive
11:00 am - 11:40 am
Jesus Camacho Rodriguez, Hortonworks
Room IV
 
Data Processing and Warehousing
Business
Achieving a 360 Degree View of Manufacturing via Open Source Industrial Data Management
11:00 am - 11:40 am
Michael Ger, Hortonworks
Room V
 
Enterprise Adoption
11:15 AM
CRASH COURSE
 
Technical
Apache Nifi Crash Course
11:15 am - 1:45 pm
Rafael Coss, Hortonworks
Convention Hall I - D Crash Course
 
11:50 AM
BREAKOUT
 
Technical
Ozone and HDFS’s Evolution by Sanjay Radia & Anu Engineer
11:50 am - 12:30 pm
sanjay Radia, Hortonworks
Room I
 
Big Compute and Storage
Technical
Apache MXNet Distributed Training Big Models Explained In Depth
11:50 am - 12:30 pm
Viacheslav Kovalevskyi, Amazon Web Services
Convention Hall I - C
 
Artificial Intelligence and Data Science
Technical
An Introduction to Druid
11:50 am - 12:30 pm
Fangjin Yang, Imply
Room III
 
IoT and Streaming
Technical
Airline Reservations and Routing: A Graph Use Case
11:50 am - 12:30 pm
Jason Plurad, IBM
Room IV
 
Data Processing and Warehousing
Technical
Operating a secure big data platform in a multi-cloud environment
11:50 am - 12:30 pm
Sandeep Chandra, San Diego Supercomputer Center
Room V
 
Cloud and Operations
12:30 PM
 
 
Lunch
12:30 pm - 2:00 pm
 
 
 
Women in Big Data Lunch and Panel
12:30 pm - 2:00 pm
2:00 PM
BREAKOUT
 
Business
Building Audi's enterprise big data platform
2:00 pm - 2:40 pm
Carsten Herbe, Audi Business Innovation GmbH
Europe
 
Enterprise Adoption
Technical
Deep learning on YARN - Running distributed Tensorflow / MXNet / Caffe / XGBoost on Hadoop clusters
2:00 pm - 2:40 pm
Wangda Tan, Hortonworks
Room II
 
Big Compute and Storage
Business
Machine Learning Trading Bots
2:00 pm - 2:40 pm
Diego Baez, Hortonworks
Room III
 
Artificial Intelligence and Data Science
Technical
Enabling Real Interactive BI on Hadoop
2:00 pm - 2:40 pm
Boaz Raufman, JethroData
Room V
 
Data Processing and Warehousing
2:50 PM
BREAKOUT
 
Technical
Reaching scale limits on a Hadoop platform: issues and errors created by speed and agility.
2:50 pm - 3:30 pm
Antonio Alvarez, Santander technology
Europe
 
Big Compute and Storage
Technical
Accelerating XGBoost applications with GPU and Spark
2:50 pm - 3:30 pm
Yanbo Liang, hortonworks
Room I
 
Artificial Intelligence and Data Science
Technical
Inside open metadata - the deep dive
2:50 pm - 3:30 pm
Mandy Chessell, IBM
Room II
 
Governance and Security
Technical
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
2:50 pm - 3:30 pm
Nishant Bangarwa, Hortonworks
Room IV
 
IoT and Streaming
Technical
Tools and approaches for migrating Big Datasets to the cloud
2:50 pm - 3:30 pm
Adrian Woodhead, Hotels.com
Room V
 
Data Processing and Warehousing
3:00 PM
CRASH COURSE
 
Technical
Apache Spark
3:00 pm - 5:30 pm
Rafael Coss, Hortonworks
Convention Hall I - D Crash Course
 
3:30 PM
 
 
Afternoon Break
3:30 pm - 4:00 pm
4:00 PM
BREAKOUT
 
Technical
Next Gen Tooling for Building Streaming Analytics Apps: Code-Less Development, Unit and Integration Testing, Continuous Integration & Delivery
4:00 pm - 4:40 pm
George Vetticaden, Hortonworks
Europe
 
IoT and Streaming
Technical
ORC Improvement in Apache Spark 2.3
4:00 pm - 4:40 pm
Dongjoon Hyun, Hortonworks
Convention Hall I - C
 
Data Processing and Warehousing
Business
Building a future proof cyber security platform with Apache Metron
4:00 pm - 4:40 pm
Bas van de Lustgraaf, QSight IT
Room II
 
Cybersecurity
Technical
Powering Tensorflow with Big Data (Apache BEAM, Flink & Spark)
4:00 pm - 4:40 pm
Holden Karau, Google
Room III
 
Artificial Intelligence and Data Science
Technical
Docker Datascience Pipeline
4:00 pm - 4:40 pm
Lennard Cornelis, ING
Room V
 
Cloud and Operations
4:50 PM
BREAKOUT
 
Technical
Not Just a Necessary Evil, It’s Good for Business: Implementing PCI DSS controls for Hadoop Ecosystem at UK’s Largest Card Issuer
4:50 pm - 5:30 pm
David Walker, Worldpay
Europe
 
Governance and Security
Technical
Present and future of unified, portable and efficient data processing with Apache Beam
4:50 pm - 5:30 pm
Davor Bonaci, Apache Software Foundation; Simbly
Room II
 
IoT and Streaming
Technical
HDFS Tiered Storage: mounting object stores in HDFS
4:50 pm - 5:30 pm
Thomas Demoor, Western Digital
Room III
 
Big Compute and Storage
Business
From an experiment to a real production environment
4:50 pm - 5:30 pm
Jeroen Wolffensperger, Rabobank
Room V
 
Enterprise Adoption
5:40 PM
BREAKOUT
 
Technical
Apache Spark, Apache Zeppelin & Data Science
5:40 pm - 6:55 pm
Rafael Coss, Hortonworks
Europe
 
Birds of a Feather
Technical
IoT, Streaming & Data Flow
5:40 pm - 6:55 pm
Rafael Coss, Hortonworks
Room I
 
Birds of a Feather
Technical
Apache Hive, Apache Hbase & Apache Phoenix
5:40 pm - 6:55 pm
Rafael Coss, Hortonworks
Convention Hall I - C
 
Birds of a Feather
Technical
Security and Governance
5:40 pm - 6:55 pm
Rafael Coss, Hortonworks
Room II
 
Birds of a Feather
Technical
Apache Hadoop - YARN, HDFS
5:40 pm - 6:55 pm
Rafael Coss, Hortonworks
Room III
 
Birds of a Feather
Technical
Cybersecurity and Apache Metron
5:40 pm - 6:55 pm
Rafael Coss, Hortonworks
Room V
 
Birds of a Feather
7:00 PM
 
 
DataWorks Summit Sponsor Reception
7:00 pm - 9:00 pm
3:30 AM
 
 
Afternoon Break
3:30 am - 4:00 pm
7:30 AM
 
 
Industry Roundtables
7:30 am - 8:45 am
 
 
 
Registration
7:30 am - 6:00 pm
8:00 AM
 
 
Breakfast
8:00 am - 9:00 am
9:00 AM
 
 
Keynote
9:00 am - 10:30 am
10:30 AM
 
 
Community Showcase
10:30 am - 2:00 pm
 
 
 
Morning Break
10:30 am - 11:00 am
11:00 AM
BREAKOUT
 
Technical
Hadoop and Spark Services at CERN
11:00 am - 11:40 am
Evangelos Motesnitsalis, CERN
Europe
 
Big Compute and Storage
Business
Building the Future of ERP with Open Source
11:00 am - 11:40 am
Andrew Psaltis, Hortonworks
Convention Hall I - C
 
Enterprise Adoption
Technical
Intelligently Collecting Data at the Edge — Intro to Apache MiNiFi
11:00 am - 11:40 am
Andy LoPresto, Hortonworks, Inc
Room II
 
IoT and Streaming
Technical
Why Kubernetes as a container orchestrator is a right choice for running spark clusters on cloud?
11:00 am - 11:40 am
Rachit Arora, IBM
Room III
 
Cloud and Operations
Technical
GDPR Focused Partner Community Showcase for Apache Ranger and Apache Atlas
11:00 am - 11:40 am
Ali Bajwa, Hortonworks
Room IV
 
Governance and Security
Technical
Zero ETL analytics with LLAP in Azure HDInsight
11:00 am - 11:40 am
ASHISH THAPLIYAL, Microsoft Corp
Room V
 
Data Processing and Warehousing
11:15 AM
CRASH COURSE
 
Technical
Data Science Crash Course
11:15 am - 1:45 pm
Rafael Coss, Hortonworks
Convention Hall I - D Crash Course
 
11:50 AM
BREAKOUT
 
Technical
Sharing Metadata Across the Data Lake and Streams
11:50 am - 12:30 pm
Alan Gates, Hortonworks
Europe
 
Data Processing and Warehousing
Technical
Apache Deep Learning 101
11:50 am - 12:30 pm
Timothy Spann, Hortonworks
Room I
 
Artificial Intelligence and Data Science
Technical
An Elastic Batch- and Stream Processing Stack with Pravega and Apache Flink
11:50 am - 12:30 pm
Stephan Ewen, data Artisans
Room III
 
IoT and Streaming
Technical
Securing and Governing a Multi-Tenant Data Lake within the Financial Industry
11:50 am - 12:30 pm
Ian Pillay, Standard Bank
Room IV
 
Governance and Security
Technical
Evaluation of TPC-H on Spark & Spark SQL in ALOJA
11:50 am - 12:30 pm
Raphael Radowitz, Goethe University
Room V
 
Data Processing and Warehousing
12:30 PM
 
 
Lunch
12:30 pm - 2:00 pm
2:00 PM
BREAKOUT
 
Technical
LLAP: Great On-Premises and Great in the Cloud
2:00 pm - 2:40 pm
Chris Nauroth, The Walt Disney Company
Europe
 
Data Processing and Warehousing
Technical
Bringing Complex Event Processing to Spark Streaming
2:00 pm - 2:40 pm
Prabhu Thukkaram, Oracle
Room I
 
IoT and Streaming
Technical
The Power of Intelligent Flows: Realtime IoT Botnet Classification with NiFi
2:00 pm - 2:40 pm
Andy LoPresto, Hortonworks, Inc
Convention Hall I - C
 
Cybersecurity
Business
O2’s Financial Data Hub - Going beyond IFRS compliance to support digital transformation
2:00 pm - 2:40 pm
Jonathan Ratcliff, Telefoncia UK
Room III
 
Enterprise Adoption
Technical
Productionizing Spark ML Pipelines with the Portable Format for Analytics
2:00 pm - 2:40 pm
Nick Pentreath, IBM
Room V
 
Artificial Intelligence and Data Science
2:50 PM
BREAKOUT
 
Technical
Lessons learned running a container cloud on YARN
2:50 pm - 3:30 pm
Billie Rinaldi, Hortonworks
Europe
 
Cloud and Operations
Business Technical
Best practices and lessons learnt from Running Apache NiFi at Renault
2:50 pm - 3:30 pm
Adel Gacem, Renault
Room I
 
IoT and Streaming
Technical
Accelerating TensorFlow with RDMA for High-Performance Deep Learning
2:50 pm - 3:30 pm
Dhabaleswar K (DK) Panda, The Ohio State University
Room IV
 
Artificial Intelligence and Data Science
Business
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy & Utilities
2:50 pm - 3:30 pm
Kenneth Smith, Hortonworks
Room V
 
Enterprise Adoption
3:00 PM
CRASH COURSE
 
Technical
Streaming Analytics Crash Course
3:00 pm - 6:00 pm
Rafael Coss, Hortonworks
Convention Hall I - D Crash Course
 
4:00 PM
BREAKOUT
 
Business
GDPR - The IBM Journey to Compliance
4:00 pm - 4:40 pm
Richard Hogg, IBM
Europe
 
Governance and Security
Technical
Building Streaming Pipelines for Neural Machine Translation
4:00 pm - 4:40 pm
Kellen Sunderland, Amazon
Room I
 
Artificial Intelligence and Data Science
Technical
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
4:00 pm - 4:40 pm
Artem Ervits, Hortonworks
Convention Hall I - C
 
Data Processing and Warehousing
Technical
Omid - Scalable and Highly Available Transaction Processing for Apache Phoenix
4:00 pm - 4:40 pm
Ohad Shacham, Yahoo
Room II
 
Big Compute and Storage
Business
Risk Listening & Monitoring for Profitable Growth
4:00 pm - 4:40 pm
Cindy Maike, Hortonworks, Inc.
Room IV
 
IoT and Streaming
4:50 PM
BREAKOUT
 
Business
Teams, tools, and practices for scalable and resilient data value at Klarna Bank
4:50 pm - 5:30 pm
Erik Zeitler, Klarna Bank AB
Europe
 
Enterprise Adoption
Technical
Lessons Learned from Running Spark on Docker
4:50 pm - 5:30 pm
Thomas Phelan, BlueData Inc
Convention Hall I - C
 
Big Compute and Storage
Technical
Building A Data Driven Authorization Framework
4:50 pm - 5:30 pm
Amer Issa, Hortonworks Inc
Room II
 
Governance and Security
Technical
Recognition of Docunment Layout and Table structure with Faster-RCNN with ResNE using TensorFlow
4:50 pm - 5:30 pm
Alex Yang, IBM China Development Laboratory
Room III
 
Artificial Intelligence and Data Science
Business
How an Italian company rules the world of car insurance
4:50 pm - 5:30 pm
Beniamino Del Pizzo, Data Reply IT
Room IV
 
IoT and Streaming

Pre-Event Training

    • Apache Hadoop Ecosystem Full Stack Architecture
    • Deep Learning with Tensorflow and Keras
    • Apache Spark 2 for Data Engineers
    • Stream Applications with Apache NiFi, Kafka, Storm & SAM

This 2 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive. Topics include: Essential understanding of HDP & its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data.

PREREQUISITES
Students should be familiar with programming principles and have experience in software development. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.

TARGET AUDIENCE
Developers and data engineers who need to understand and develop Hive applications on HDP.

View Brochure

This class is designed to cover key theory and background elements of deep learning, along with hands-on activities using both TensorFlow and Keras – two of the most popular frameworks for working with neural networks. In order to gain an intuitive understanding of deep learning approaches together with practice in building and training neural nets, this class alternates theory modules and hands-on labs.

PREREQUISITES
The class communicates the mathematical aspects of deep learning in a clear, straightforward way, and does not require a background in vector calculus, although some background in calculus, linear algebra, and statistics is helpful. All code examples and labs are done with Python, so previous experience with Python is recommended.

TARGET AUDIENCE
This class is ideal for engineers or data scientists who want to gain an understanding of neural net models and modern techniques, and start to apply them to real-world problems.

View Brochure

This course introduces the Apache Spark distributed computing engine, and is suitable for developers, data analysts, architects, technical managers, and anyone who needs to use Spark in a hands-on manner. It is based on the Spark 2.x release. The course provides a solid technical introduction to the Spark architecture and how Spark works.

PREREQUISITES
Students should be familiar with programming principles and have previous experience in software development using Scala. Previous experience with data streaming, SQL, and HDP is also helpful, but not required.

TARGET AUDIENCE
Software engineers that are looking to develop in-memory applications for time sensitive and highly iterative applications in an Enterprise HDP environment.

View Brochure

This course is designed for developers who need to create real-time applications to ingest and process streaming data sources using Hortonworks Data Flow (HDF) environments. Specific technologies covered includes: Apache NiFi, Apache Kafka, Apache Storm and Hortonworks Schema Registry and Streaming Analytics Manager.

PREREQUISITES
Students should be familiar with programming principles and have experience in software development. Java programming experience is required. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.

TARGET AUDIENCE
Developers and data engineers who need to understand and develop real-time / streaming applications on Hortonworks Data Flow (HDF).

View Brochure

2018 Speakers

Artem Ervits is a Solutions Engineer at Hortonworks. Hortonworks is a leading big data software company based in Santa Clara, California. The company develops and supports Apache Hadoop, for the distributed processing of large data sets across computer clusters. Artem is an organizer of the NYC Future of Data Meetup and contributor to Apache Oozie. He works with Workflow Manager and Oozie product management and engineering teams to shape the future direction for Workflow Manager and Oozie. You may reach him with questions on Oozie, HBase, Phoenix, Pig and Hive.

Sponsors

Packages & Passes

 
All-Access Pass
Pre Event Training
EARLY BIRD
Thru Jan. 31 2018
€750
-
STANDARD
Feb. 1 - Apr. 15
€900
-
ONSITE
Apr. 16 - 19
€975
-
Entry into the opening keynote
-
Entry into the Community Showcase
-
Entry into the Sponsor Reception
-
All programmed meals
-
Entry into all breakout sessions
-
Entry into all crash courses
-
2-day Pre Event Training class of your choice
€2000
 

Venue & Travel

Location Icon
Estrel Hotel

Estrel Hotel, Sonnenallee, Berlin, Germany

+49 30 6831 0

Visit Event Center Website

Estrel Hotel

Estrel Hotel, Sonnenallee, Berlin, Germany

View on Google Maps

Get Social, Stay Connected!