DataWorks Summit in Munich, Germany

April 05–06, 2017

Tracks

Tracks are divided into eight key topic areas, which will cover:

Apache Hadoop

Apache Hadoop

Apache Hadoop continues to drive innovation at a rapid pace, and the next generation of Hadoop is being built today. This track showcases new developments in core Hadoop and closely related technologies. Attendees will hear about key projects, such as HDFS and YARN, projects in incubation, and the industry initiatives driving innovation in and around the Hadoop platform. Attendees will interact with technical leads, committers, and expert users who are actively driving the roadmaps, key features, and advanced technology research around what is coming next for the Apache Hadoop.

Cloud and Operations

Cloud and Operations

For a system to be “open for business,” system administrators must be able to efficiently manage and operate it. That requires a comprehensive dataflow and operations strategy. This track provides best practices for deploying and operating data lakes, streaming systems, and the extended Apache data ecosystem on premises and in the cloud. Sessions cover the full deployment lifecycle including installation, configuration, initial production deployment, upgrading, patching, loading, moving, backup, and recovery.

You’ll discover how to get started and how to operate your cluster. Speakers will show how to set up and manage high-availability configurations and how DevOps practices can help speed solutions into production. They’ll explain how to manage data across the edge, the data center, and the cloud. And they’ll offer cutting-edge best practices for large-scale deployments.

Sample technologies:
Apache Ambari, Cloudbreak, HDInsight, HDCloud, Data Plane Service, AWS, Azure, and Apache Oozie

Governance and Security

Governance and Security

Your data lake contains a growing volume of diverse enterprise data, so a breach could be catastrophic. Privacy violations and regulatory infractions can damage your corporate image and long-term shareholder value. Government and industry regulations demand you properly secure and govern your data to assure compliance and mitigate risks. But as Hadoop and streaming applications emerge as a critical foundation of a modern data architecture, enterprises face new requirements for protection and governance.

In this track, you’ll learn about the key enterprise requirements for governance and security of the extended data plane. You’ll hear best practices, tips, tricks, and war stories on how to secure and govern your big data infrastructure.

Sample technologies:
Apache Ranger, Apache Sentry, Apache Atlas, and Apache Knox

Apache Spark and Data Science

Apache Spark and Data Science

Artificial Intelligence is transforming every vertical. Several popular tools and projects have contributed to this accelerated transformation: Apache Spark for large-scale Machine Learning, TensorFlow for Deep Learning, and Apache Zeppelin and Jupyter notebooks enabling Data Scientist to quickly prototype, test, and deploy advanced ML models. This track covers introductory to advanced sessions on algorithms, tools, applications, and emerging research topics that extend the Hadoop ecosystem for data science. Sessions will include examples of innovative analytics applications and systems, data visualization, statistics, and machine learning, deep learning and artificial intelligence. You will hear from leading data scientists, analysts and practitioners who are driving innovation by extracting valuable insights from data at rest as well as data in motion.

Artificial Intelligence is transforming every vertical. Several popular tools and projects have contributed to this accelerated transformation: Apache Spark for large-scale Machine Learning, TensorFlow for Deep Learning, and Apache Zeppelin and Jupyter notebooks enabling Data Scientist to quickly prototype, test, and deploy advanced ML models. This track covers introductory to advanced sessions on algorithms, tools, applications, and emerging research topics that extend the Hadoop ecosystem for data science. Sessions will include examples of innovative analytics applications and systems, data visualization, statistics, and machine learning, deep learning and artificial intelligence. You will hear from leading data scientists, analysts and practitioners who are driving innovation by extracting valuable insights from data at rest as well as data in motion.

Data Processing and Warehousing

Data Processing and Warehousing

Apache Hadoop – YARN has transformed Hadoop into a multi-tenant data platform. It is the foundation for a wide range of processing engines that empowers businesses to interact with the same data in multiple ways simultaneously. This means applications can interact with the data in the most appropriate way: from batch to interactive SQL or low latency access with NoSQL, and the interaction of legacy data stores and big data. There is a vast ecosystem of SQL engines and tools that are enabling richer Data Warehousing on Hadoop with capabilities for ACID, interactive queries, OLAP and data transformation. You will have the opportunity to hear from the rock stars of the Apache community and learn how these innovators are building applications.

IoT and Streaming

IoT and Streaming

The rapid proliferation of sensors and connected devices is fueling an explosion in data. Streaming data allows algorithms to dynamically adapt to new patterns in data, which is critical in applications like fraud detection and stock price prediction. Deploying real-time machine learning models in data streams enables insights and interactions not previously possible.

In this track you’ll learn how to apply machine learning to capture perishable insights from streaming data sources and how to manage devices at the “jagged edge.” Sessions present new strategies and best practices for data ingestion and analysis. Presenters will show how to use these technologies to develop IoT solutions and how to combine historical with streaming data to build dynamic, evolving, real-time predictive systems for actionable insights.

Sample technologies:
Apache Nifi, Apache Storm, Streaming Analytics Manager, Apache Flink, Apache Spark Streaming, Apache Beam, Apache Pulsar and Apache Kafka

Applications

Applications

In this track you will hear from ISVs, and architects that have created applications, frameworks, and solutions that have been built to solve real business problems leveraging data as an asset. These Modern Data Applications are augmenting traditional architectures and extending the reach for insights from the edge to the data center. Sessions in this track span both technical and business audiences discussing business justification and ROI to technical architecture.

Enterprise Adoption

Enterprise Adoption

Enterprise business leaders and innovators are using data to transform their businesses. These modern data applications are augmenting traditional architectures and extending the reach for insights from the edge to the data center. Sessions in this track will discuss business justification and ROI for modern data architectures.

You’ll hear from ISVs and architects who have created applications, frameworks, and solutions that leverage data as an asset to solve real business problems. Speakers from companies and organizations across industries and geographies will describe their data architectures, the business benefits they’ve experienced, their challenges, secrets to their successes, use cases, and the hard-fought lessons learned in their journeys.

Pre Event Training

This 1 day course details the business value for, and provides a technical overview of, Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course serves as an optional primer for those who plan to attend a hands-on, instructor-led course.

AUDIENCE

Data architects, data integration architects, managers, C-level executives, decision makers, technical infrastructure team, and Hadoop administrators or developers who want to understand the fundamentals of Big Data and the Hadoop ecosystem.

PREREQUISITES

No previous Hadoop or programming knowledge is required. Students are encouraged to bring their wi-fi enabled laptop pre-loaded with the Hortonworks Sandbox should they want to duplicate demonstrations on their own machine.

Learn Data Science techniques and best practices leveraging the Hadoop ecosystem and tools in this 2 day course.

AUDIENCE

Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Apache Hadoop.

PREREQUISITES

Students must have experience with at least one programming such as Python, or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles.

This 2 day course is designed for administrators who will be managing the Hortonworks Data Platform (HDP) 2.5 with Ambari. It Covers installation, configuration, and other typical cluster management tasks.

AUDIENCE

IT administrators and operators responsible for installing, configuring, and supporting an HDP 2.5 deployment in a Linux environment using Ambari.

PREREQUISITES

No previous Hadoop knowledge is required, though will be useful. Attendees should be familiar with data center operations and Linux system administration. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.

This 2 day course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. The focus will be on utilizing the Spark API from Python or Scala.

AUDIENCE

Developers, Architects, and Admins who would like to learn more about developing data applications in Spark, how it will affect their environment, and ways to optimize application.

PREREQUISITES

No previous Hadoop knowledge is required, though will be useful. Basic knowledge of Python or Scala is required. Previous exposure to SQL is helpful, but not required. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.

This 2 day course is designed for ‘Data Stewards’ or ‘Data Flow Managers’ who are looking forward to automate the flow of data between systems.

AUDIENCE

Data Engineers, Integration Engineers and Architects who are looking forward to automate Data flow between systems.

PREREQUISITES

Good to have some experience with Linux and basic understanding of DataFlow tools. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.

Sponsors

Venue & Travel

Location Icon
INTERNATIONAL CONGRESS CENTRE MUNICH

International Congress Centre Munich, Munich, Germany

+49 89 94923023

Visit Event Center Website

Hotel
THE CHARLES HOTEL

Rocco Forte The Charles Hotel, Sophienstraße, Munich, Germany

+49 89 5445551430

Visit Hotel Website

INTERNATIONAL CONGRESS CENTRE MUNICH

International Congress Centre Munich, Munich, Germany

View on Google Maps
THE CHARLES HOTEL

Rocco Forte The Charles Hotel, Sophienstraße, Munich, Germany

View on Google Maps

NEVER MISS ANOTHER SPEAKER ANNOUNCEMENT

Get Social, Stay Connected!