DataWorks Summit in Berlin, Germany

April 16–19, 2018

Overview

DataWorks Summit: Ideas. Insights. Innovation.

Leading enterprises are using advanced analytics, data science, and artificial intelligence to transform the way they deliver customer and product experiences at scale. Discover how they’re doing it at the world’s premier big data community event for everything data—DataWorks Summit.

Come learn about the latest developments, while networking with industry peers and pioneers to learn how to apply open source technology to make data work and accelerate your digital transformation.

Tracks

Tracks

    • Apache Hadoop
    • Applications
    • Artificial Intelligence and Data Science
    • Big Compute and Storage
    • Cloud and Operations
    • Cybersecurity
    • Data Warehousing and Operational Data Stores
    • Enterprise Adoption
    • Governance and Security
    • IoT and Streaming
    • Operations, Governance and Security
Apache Hadoop continues to drive innovation at a rapid pace, and the next generation of Hadoop is being built today. This track showcases new developments in core Hadoop and closely related technologies.

Attendees will hear about key projects, such as HDFS and YARN, projects in incubation, and the industry initiatives driving innovation in and around the Hadoop platform.

Attendees will interact with technical leads, committers, and expert users who are actively driving the roadmaps, key features, and advanced technology research around what is coming next for the Apache Hadoop.
In this track you will hear from ISVs, and architects that have created applications, frameworks, and solutions that have been built to solve real business problems leveraging data as an asset. These Modern Data Applications are augmenting traditional architectures and extending the reach for insights from the edge to the data center.

Sessions in this track span both technical and business audiences discussing business justification and ROI to technical architecture.
Artificial Intelligence (AI) is transforming every industry. Data science and machine learning are opening new doors in process automation, predictive analytics, and decision optimization. This track offers sessions spanning the entire data science lifecycle: development, test, and production. You’ll see examples of innovative analytics applications and systems for data visualization, statistics, machine learning, cognitive systems, and deep learning. We’ll show you how to use modern open source workbenches to develop, test, and evaluate advanced AI models before deploying them. You’ll hear from leading researchers, data scientists, analysts, and practitioners who are driving innovation in AI and data science. Sample technologies: Apache Spark, R, Apache Livy, Apache Zeppelin, Jupyter, scikit-learn, Keras, TensorFlow, DeepLearning4J, Chainer, Lasagne/Blocks/Theano, CaffeOnSpark, Apache MXNet, and PyTorch/Torch
Apache Hadoop continues to drive data management innovation at a rapid pace. Hadoop 3.0 adds container management to YARN, an object store to HDFS, and more. This track presents these advances and describes projects in incubation and the industry initiatives driving innovation in and around the Hadoop platform.

You’ll learn about key projects like HDFS, YARN, and related technologies. You’ll interact with technical leads, committers, and experts who are driving the roadmaps, key features, and advanced technology research around what is coming next and the extended open source big compute and storage ecosystem.

Sample technologies: Apache Hadoop (YARN, HDFS, Ozone), Apache Kudu, Kubernetes, Apache BookKeeper
For a system to be “open for business,” system administrators must be able to efficiently manage and operate it. That requires a comprehensive dataflow and operations strategy. This track provides best practices for deploying and operating data lakes, streaming systems, and the extended Apache data ecosystem on premises and in the cloud. Sessions cover the full deployment lifecycle including installation, configuration, initial production deployment, upgrading, patching, loading, moving, backup, and recovery. You’ll discover how to get started and how to operate your cluster. Speakers will show how to set up and manage high-availability configurations and how DevOps practices can help speed solutions into production. They’ll explain how to manage data across the edge, the data center, and the cloud. And they’ll offer cutting-edge best practices for large-scale deployments. Sample technologies: Apache Ambari, Cloudbreak, HDInsight, HDCloud, Data Plane Service, AWS, Azure, and Apache Oozie
The speed and scale of recent ransomware attacks and cybersecurity breaches have taught us that threat detection and mitigation are the key to security operations in data-driven businesses. Creating cybersecurity machine learning models and deploying these models in streaming systems is becoming critical to defending and managing these growing threats. In this track, you’ll learn how to leverage big data and stream processing to improve your cybersecurity. Experts will explain how to scale with analytics on more data and react in real time. Sample technologies: Apache Metron, Apache Spot
Apache Hadoop YARN has transformed Hadoop into a multi-tenant data platform that enables the interaction of legacy data stores and big data. It is the foundation for multiple processing engines that let applications interact with data in the most appropriate way from batch to interactive SQL to low latency access with NoSQL. Sessions will cover the vast ecosystem of SQL engines and tools that enable richer enterprise data warehousing (EDW) on Hadoop. You’ll learn how NoSQL stores like Apache HBase are adding transactional capability that brings traditional operational data store (ODS) workloads to Hadoop and why data preparation is a key workload. You’ll meet Apache community rock stars and learn how these innovators are building the applications of the future. Sample technologies: Apache Hive, Apache Tez, Apache ORC, Druid, Apache Parquet, Apache HBase, Apache Phoenix, Apache Accumulo, Apache Drill, Presto, Apache Pig, JanusGraph, Apache Impala
Enterprise business leaders and innovators are using data to transform their businesses. These modern data applications are augmenting traditional architectures and extending the reach for insights from the edge to the data center. Sessions in this track will discuss business justification and ROI for modern data architectures. You’ll hear from ISVs and architects who have created applications, frameworks, and solutions that leverage data as an asset to solve real business problems. Speakers from companies and organizations across industries and geographies will describe their data architectures, the business benefits they’ve experienced, their challenges, secrets to their successes, use cases, and the hard-fought lessons learned in their journeys.
Your data lake contains a growing volume of diverse enterprise data, so a breach could be catastrophic. Privacy violations and regulatory infractions can damage your corporate image and long-term shareholder value. Government and industry regulations demand you properly secure and govern your data to assure compliance and mitigate risks. But as Hadoop and streaming applications emerge as a critical foundation of a modern data architecture, enterprises face new requirements for protection and governance.

In this track, you’ll learn about the key enterprise requirements for governance and security of the extended data plane. You’ll hear best practices, tips, tricks, and war stories on how to secure and govern your big data infrastructure.

Sample technologies: Apache Ranger, Apache Sentry, Apache Atlas, and Apache Knox
The rapid proliferation of sensors and connected devices is fueling an explosion in data. Streaming data allows algorithms to dynamically adapt to new patterns in data, which is critical in applications like fraud detection and stock price prediction. Deploying real-time machine learning models in data streams enables insights and interactions not previously possible. In this track you’ll learn how to apply machine learning to capture perishable insights from streaming data sources and how to manage devices at the “jagged edge.” Sessions present new strategies and best practices for data ingestion and analysis. Presenters will show how to use these technologies to develop IoT solutions and how to combine historical with streaming data to build dynamic, evolving, real-time predictive systems for actionable insights. Sample technologies: Apache Nifi, Apache Storm, Streaming Analytics Manager, Apache Flink, Apache Spark Streaming, Apache Beam, Apache Pulsar and Apache Kafka
With the growing volumes of diverse data being stored in the Data Lake, any breach of this enterprise-wide data can be catastrophic, from privacy violations and regulatory infractions to corporate image and long-term shareholder value.

This track focuses on the key enterprise requirements for governance and security for the extended data plane. As Hadoop and streaming applications emerges as a critical foundation of a modern data application, the enterprise has placed stringent requirements on it for these key areas. Speakers will present best practices with an emphasis on tips, tricks, and war stories on how to secure your big data infrastructure. Sessions will also cover full deployment lifecycle for on-premise and cloud deployments, including installation, configuration, initial production deployment, recovery, security, and data governance for Hadoop.

This track covers the core practices and patterns for planning, deploying, loading, moving, backup/recovery, HA and managing data across edge, on-premise and cloud. The track is focused on deploying and operating Hadoop and the extended Apache Data ecosystem in the on-premise and cloud.
Agenda

Agenda at a Glance

MONDAY, APRIL 16
8:30 AM - 5:00 PM
Pre-event Training
TUESDAY, APRIL 17
8:30 AM - 5:00 PM
Pre-event Training
WEDNESDAY, APRIL 18
7:30 AM - 6:00 PM
Registration
9:00 AM - 10:30 AM
Opening Keynote
10:30 AM – 4:00 PM
Community Showcase
11:00 AM – 5:40 PM
Track Sessions and Crash Courses
5:40 PM – 7:00 PM
Birds of a Feather
7:00 PM – 9:00 PM
Sponsor Reception
THURSDAY, APRIL 19
7:30 AM - 6:00 PM
Registration
9:00 AM - 10:30 AM
Opening Keynote
10:30 AM – 2:00 PM
Community Showcase
11:10 AM – 5:50 PM
Track Sessions and Crash Courses

Dataworks Expo Theater

An opportunity to participate and to be inspired
Visit the DataWorks Expo Theater for free 20-minute educational presentations from industry experts in advanced analytics, data science, and artificial intelligence.

The Expo Theater is located at the rear of the Expo hall, is free to all attendees, and brings together thought-leaders and innovators for inspiring talks, demo sessions and hands-on networking opportunities. Don't miss your chance to participate in these educational sessions and to be inspired by what's driving the world of big data.

2018 Speakers

Matthias Graunitz is a big data architect at Audi, where he works at the company’s Competence Center for Big Data and Business Intelligence, where he is responsible for the architectural framework of the Hadoop ecosystem, a separate Kafka Cluster as well as for the data science tool kits provided by the Center of Competence for all business departments at Audi. Matthias has more than 10 years’ experience in the field of business intelligence and big data.

Sponsors

Sponsors

Packages & Passes

Packages & Passes

Conference Pass
EARLY BIRD
Thru Jan. 31 2018
Standard
Feb. 1 - Apr. 15
OnSite
Apr. 16 - 19
Full Conference
Access to DataWorks Summit keynotes, breakouts, meals and events, including crash courses, community showcase, and the sponsor reception
€750
€900
€975
Day Pass
Single day access to keynotes, breakouts, lunch and other DataWorks Summit events
N/A
€475
€475
Pre-Event Training
Prices are for a single class and conference attendees may only enroll in one class.
N/A
€2000
€2000
 
Package
Full Conference*
Day Pass**
Pre-Event Training***
 
EARLY BIRD
Thru Jan. 31 2018
€750
N/A
N/A
Standard
Feb. 1 - Apr. 15
€900
€475
€2000
OnSite
Apr. 16 - 19
€975
€475
€2000
*Access to DataWorks Summit keynotes, breakouts, meals and events, including crash courses, community showcase, and the sponsor reception
**Single day access to keynotes, breakouts, lunch and other DataWorks Summit events
***Prices are for a single class and conference attendees may only enroll in one class.

Venue & Travel

Location Icon
Estrel Hotel

Estrel Hotel, Sonnenallee, Berlin, Germany

+49 30 6831 0

Visit Event Center Website

Estrel Hotel

Estrel Hotel, Sonnenallee, Berlin, Germany

View on Google Maps