DataWorks Summit in San Jose, California

June 17–21, 2018

DataWorks Summit: Ideas. Insights. Innovation.

Leading enterprises are using advanced analytics, data science, and artificial intelligence to transform the way they deliver customer and product experiences at scale. Discover how they’re doing it at the world’s premier big data community event for everything data—DataWorks Summit.

Come learn about the latest developments, while networking with industry peers and pioneers to learn how to apply open source technology to make data work and accelerate your digital transformation.

Tracks

DataWorks Summit San Jose 2018 will feature three days of content across eight tracks dedicated to enabling next-generation data platforms. You’ll hear industry experts, architects, data scientists, and open source Apache developers and committers share success stories, best practices, cautionary tales, and technology insights that provide practical guidance to novices as well as experienced practitioners of modern data infrastructure.

Technical sessions explore specific technologies, applications, and use cases to help you understand the technology available, how to apply it, and what others are achieving. These sessions will range in technical depth from introductory through intermediate to advanced.

Business sessions connect technology to business needs and outcomes. Sessions include case studies, executive briefings, and tutorials that detail best practices to becoming a data-driven organization. Speakers will explain how their businesses are making the move from legacy data stores to modern data architectures, and they’ll discuss the roadblocks and cultural and organizational challenges they faced.

Crash Courses provide a hands-on introduction to key Apache projects. They start with a technical introduction, then you’ll explore—under the guidance of an expert instructor—on your own machine. You’ll walk away with a working environment to continue your journey. See Crash Courses available.

Birds-of-a-Feather sessions (BoFs)—hosted by Apache committers, architects, and engineers—provide a forum for you to connect and share. There’s no agenda; the group goes where issues and interests take them. Come share your experiences and challenges on Apache projects in these BoFs.

Eight tracks provide the information you need to apply powerful new technologies and understand the value they bring to modern data-driven organizations.

Tracks are divided into eight key topic areas, which will cover:

Data Warehousing and Operational Data Stores

Data Warehousing and Operational Data Stores

Apache Hadoop YARN has transformed Hadoop into a multi-tenant data platform that enables the interaction of legacy data stores and big data. It is the foundation for multiple processing engines that let applications interact with data in the most appropriate way from batch to interactive SQL to low latency access with NoSQL.

Sessions will cover the vast ecosystem of SQL engines and tools that enable richer enterprise data warehousing (EDW) on Hadoop. You’ll learn how NoSQL stores like Apache HBase are adding transactional capability that brings traditional operational data store (ODS) workloads to Hadoop and why data preparation is a key workload. You’ll meet Apache community rock stars and learn how these innovators are building the applications of the future.

Sample technologies:
Apache Hive, Apache Tez, Apache ORC, Druid, Apache Parquet, Apache HBase, Apache Phoenix, Apache Accumulo, Apache Drill, Presto, Apache Pig, JanusGraph, Apache Impala

Artificial Intelligence and Data Science

Artificial Intelligence and Data Science

Artificial Intelligence (AI) is transforming every industry. Data science and machine learning are opening new doors in process automation, predictive analytics, and decision optimization. This track offers sessions spanning the entire data science lifecycle: development, test, and production.

You’ll see examples of innovative analytics applications and systems for data visualization, statistics, machine learning, cognitive systems, and deep learning. We’ll show you how to use modern open source workbenches to develop, test, and evaluate advanced AI models before deploying them. You’ll hear from leading researchers, data scientists, analysts, and practitioners who are driving innovation in AI and data science.
Sample technologies:

Apache Spark, R, Apache Livy, Apache Zeppelin, Jupyter, scikit-learn, Keras, TensorFlow, DeepLearning4J, Chainer, Lasagne/Blocks/Theano, CaffeOnSpark, Apache MXNet, and PyTorch/Torch

Big Compute and Storage

Big Compute and Storage

Apache Hadoop continues to drive data management innovation at a rapid pace. Hadoop 3.0 adds container management to YARN, an object store to HDFS, and more. This track presents these advances and describes projects in incubation and the industry initiatives driving innovation in and around the Hadoop platform.

You’ll learn about key projects like HDFS, YARN, and related technologies. You’ll interact with technical leads, committers, and experts who are driving the roadmaps, key features, and advanced technology research around what is coming next and the extended open source big compute and storage ecosystem.

Sample technologies:
Apache Hadoop (YARN, HDFS, Ozone), Apache Kudu, Kubernetes, Apache BookKeeper

Cloud and Operations

Cloud and Operations

For a system to be “open for business,” system administrators must be able to efficiently manage and operate it. That requires a comprehensive dataflow and operations strategy. This track provides best practices for deploying and operating data lakes, streaming systems, and the extended Apache data ecosystem on premises and in the cloud. Sessions cover the full deployment lifecycle including installation, configuration, initial production deployment, upgrading, patching, loading, moving, backup, and recovery.

You’ll discover how to get started and how to operate your cluster. Speakers will show how to set up and manage high-availability configurations and how DevOps practices can help speed solutions into production. They’ll explain how to manage data across the edge, the data center, and the cloud. And they’ll offer cutting-edge best practices for large-scale deployments.

Sample technologies:
Apache Ambari, Cloudbreak, HDInsight, HDCloud, Data Plane Service, AWS, Azure, and Apache Oozie

For a system to be “open for business,” system administrators must be able to efficiently manage and operate it. That requires a comprehensive dataflow and operations strategy. This track provides best practices for deploying and operating data lakes, streaming systems, and the extended Apache data ecosystem on premises and in the cloud. Sessions cover the full deployment lifecycle including installation, configuration, initial production deployment, upgrading, patching, loading, moving, backup, and recovery.

You’ll discover how to get started and how to operate your cluster. Speakers will show how to set up and manage high-availability configurations and how DevOps practices can help speed solutions into production. They’ll explain how to manage data across the edge, the data center, and the cloud. And they’ll offer cutting-edge best practices for large-scale deployments.

Sample technologies:
Apache Ambari, Cloudbreak, HDInsight, HDCloud, Data Plane Service, AWS, Azure, and Apache Oozie

Governance and Security

Governance and Security

Your data lake contains a growing volume of diverse enterprise data, so a breach could be catastrophic. Privacy violations and regulatory infractions can damage your corporate image and long-term shareholder value. Government and industry regulations demand you properly secure and govern your data to assure compliance and mitigate risks. But as Hadoop and streaming applications emerge as a critical foundation of a modern data architecture, enterprises face new requirements for protection and governance.

In this track, you’ll learn about the key enterprise requirements for governance and security of the extended data plane. You’ll hear best practices, tips, tricks, and war stories on how to secure and govern your big data infrastructure.

Sample technologies:
Apache Ranger, Apache Sentry, Apache Atlas, and Apache Knox

Cybersecurity

Cybersecurity

The speed and scale of recent ransomware attacks and cybersecurity breaches have taught us that threat detection and mitigation are the key to security operations in data-driven businesses. Creating cybersecurity machine learning models and deploying these models in streaming systems is becoming critical to defending and managing these growing threats.

In this track, you’ll learn how to leverage big data and stream processing to improve your cybersecurity. Experts will explain how to scale with analytics on more data and react in real time.

Sample technologies:
Apache Metron, Apache Spot

IoT and Streaming

IoT and Streaming

The rapid proliferation of sensors and connected devices is fueling an explosion in data. Streaming data allows algorithms to dynamically adapt to new patterns in data, which is critical in applications like fraud detection and stock price prediction. Deploying real-time machine learning models in data streams enables insights and interactions not previously possible.

In this track you’ll learn how to apply machine learning to capture perishable insights from streaming data sources and how to manage devices at the “jagged edge.” Sessions present new strategies and best practices for data ingestion and analysis. Presenters will show how to use these technologies to develop IoT solutions and how to combine historical with streaming data to build dynamic, evolving, real-time predictive systems for actionable insights.

Sample technologies:
Apache Nifi, Apache Storm, Streaming Analytics Manager, Apache Flink, Apache Spark Streaming, Apache Beam, Apache Pulsar and Apache Kafka

Enterprise Adoption

Enterprise Adoption

Enterprise business leaders and innovators are using data to transform their businesses. These modern data applications are augmenting traditional architectures and extending the reach for insights from the edge to the data center. Sessions in this track will discuss business justification and ROI for modern data architectures.

You’ll hear from ISVs and architects who have created applications, frameworks, and solutions that leverage data as an asset to solve real business problems. Speakers from companies and organizations across industries and geographies will describe their data architectures, the business benefits they’ve experienced, their challenges, secrets to their successes, use cases, and the hard-fought lessons learned in their journeys.

Agenda at a Glance

Sunday, June 17
8:30 AM - 5:00 PM
Pre-event Training
Monday, June 18
8:30 AM - 5:00 PM
Pre-event Training
12:00 PM – 7:00 PM
Registration
6:00 PM – 8:00 PM
Meetups
Tuesday, June 19
7:30 AM – 7:30 PM
Registration Open
9:00 AM – 10:30 AM
Opening Keynote
10:30 AM – 4:00 PM
Community Showcase and Expo Theatre
11:10 AM – 5:30 PM
Track Sessions and Crash Courses
5:40 PM – 7:00 PM
General Session
7:00 PM – 8:30 PM
Sponsor Reception
Wednesday, June 20
7:30 AM – 7:30 PM
Registration Open
9:00 AM – 10:30 AM
Opening Keynote
10:30 AM – 4:00 PM
Community Showcase and Expo Theatre
11:10 AM – 5:30 PM
Track Sessions and Crash Courses
5:40 PM – 7:00 PM
Birds of a Feather
Thursday, June 21
8:30 AM - 1:00 PM
Registration
8:30 AM – 11:00 AM
Community Showcase and Expo Theatre
9:30 AM – 1:00 PM
Track Sessions and Crash Courses

Community Events

Monday, June 18
9:00 AM - 6:00 PM
HBaseCon
(GET 25% OFF DATAWORKS SUMMIT PASS)

Pre-Event Training

    • Apache Hadoop Ecosystem Full Stack Architecture
    • Deep Learning with Tensorflow and Keras
    • Apache Spark 2 for Data Engineers
    • Stream Applications with Apache NiFi, Kafka, Storm & SAM
    • Security with Apache Ranger
    • Data Flow Management with Apache NiFi
    • Apache Hadoop Administration with Ambari

This 2 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive. Topics include: Essential understanding of HDP & its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data.

View Brochure

This class is designed to cover key theory and background elements of deep learning, along with hands-on activities using both TensorFlow and Keras – two of the most popular frameworks for working with neural networks. In order to gain an intuitive understanding of deep learning approaches together with practice in building and training neural nets, this class alternates theory modules and hands-on labs.

PREREQUISITES
The class communicates the mathematical aspects of deep learning in a clear, straightforward way, and does not require a background in vector calculus, although some background in calculus, linear algebra, and statistics is helpful. All code examples and labs are done with Python, so previous experience with Python is recommended.

TARGET AUDIENCE
This class is ideal for engineers or data scientists who want to gain an understanding of neural net models and modern techniques, and start to apply them to real-world problems.

View Brochure

Spark 2.x release. The course provides a solid technical introduction to the Spark architecture and how Spark works.

PREREQUISITES
Students should be familiar with programming principles and have previous experience in software development using Scala. Previous experience with data streaming, SQL, and HDP is also helpful, but not required.

TARGET AUDIENCE
Software engineers that are looking to develop in-memory applications for time sensitive and highly iterative applications in an Enterprise HDP environment.

View Brochure

This course is designed for developers who need to create real-time applications to ingest and process streaming data sources using Hortonworks Data Flow (HDF) environments. Specific technologies covered includes: Apache NiFi, Apache Kafka, Apache Storm and Hortonworks Schema Registry and Streaming Analytics Manager.

PREREQUISITES
Students should be familiar with programming principles and have experience in software development. Java programming experience is required. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.

TARGET AUDIENCE
Developers and data engineers who need to understand and develop real-time / streaming applications on Hortonworks Data Flow (HDF).

View Brochure

This course is designed for experienced administrators who will be implementing secure Hadoop clusters using authentication, authorization, auditing and data protection strategies and tools.

PREREQUISITES
Students should be experienced in the management of Hadoop using Ambari and Linux environments.
Completion of the following course is required before taking HDP Hadoop Security:
• ADM-221 HDP Administration I Foundations

TARGET AUDIENCE
IT administrators and operators responsible for installing, configuring and supporting an Apache Hadoop deployment for a secure environment.

View Brochure

This course is designed for ‘Data Stewards’ or ‘Data Flow Managers’ who are looking forward to automate the flow of data between systems. Topics Include Introduction to NiFi, Installing and Configuring NiFi, Detail explanation of NiFi User Interface, Explanation of its components and Elements associated with each. How to Build a dataflow, NiFi Expression Language, Understanding NiFi Clustering, Data Provenance, Security around NiFi, Monitoring Tools and HDF Best practices.

PREREQUISITES
Students should be familiar with programming principles and have previous experience in software development. Experience with Linux and a basic understanding of Dataflow tools would be helpful. No prior Hadoop/NiFi experience required, but is very helpful.

TARGET AUDIENCE
Data Engineers, Integration Engineers and Architects who are looking to automate Data flow between systems.

View Brochure

This course is intended for systems administrators who will be responsible for the design, installation, configuration, and management of the Hortonworks Data Platform (HDP). The course provides in-depth knowledge and experience in using Apache Ambari as the operational management platform for HDP. This course presumes no prior knowledge or experience with Hadoop.

PREREQUISITES
Students must have experience working in a Linux environment with standard Linux system commands. Students should be able to read and execute basic Linux shell scripts. Basic knowledge of SQL statements is recommended, but not a requirement. In addition, it is recommended for students to have some operational experience in data center practices, such as change management, release management, incident management, and problem management. It is also strongly recommended that you complete Hortonworks Hadoop Essentials before taking this course.

TARGET AUDIENCE
Linux administrators and system operators responsible for installing, configuring and managing an HDP cluster.

View Brochure

Speakers

Robert Hryniewicz has over 10 years working on various projects related to Artificial Intelligence, Enterprise Software, IoT, Robotics, Blockchain and more. Currently, he’s a Data Scientist and Evangelist at Hortonworks. Previously, Robert was a CTO at a Singularity Labs startup, Sr. Architect at Cisco, NASA et al. He’s a frequent speaker at DataWorks / Hadoop Summits.

Sponsors

Packages & Passes

Conference Pass
EARLY BIRD
Thru Mar 30, 2018
Standard
Mar 31 – Jun 16
OnSite
Full Conference
Access to DataWorks Summit keynotes, breakouts, meals and events, including crash courses, community showcase, and the sponsor reception
$1,250
$1,750
$2,000
Day Pass
Single day access to keynotes, breakouts, lunch and other DataWorks Summit events
N/A
$500
$500
Pre-Event Training
Prices are for a single class and conference attendees may only enroll in one class.
N/A
$2000
$2000
 

Venue & Travel

Location Icon
SAN JOSE MCENERY CONVENTION CENTER

San Jose McEnery Convention Center, West San Carlos Street, San Jose, CA, United States

1 (408) 295-9600

Visit Event Center Website

Hotel
San Jose Marriott

San Jose Marriott, 301 South Market Street, San Jose, CA 95113, USA

1 (408) 280-1300

Visit Hotel Website

SAN JOSE MCENERY CONVENTION CENTER

San Jose McEnery Convention Center, West San Carlos Street, San Jose, CA, United States

View on Google Maps
San Jose Marriott

San Jose Marriott, 301 South Market Street, San Jose, CA 95113, USA

View on Google Maps
Info Icon

Please click here to take advantage of our discounted group rate of $335++ per night at the San Jose Marriott which is conveniently connected to the San Jose Convention Center. Note a credit card is required to secure the reservation and all nights are subject to availability.

Get Social, Stay Connected!