DataWorks Summit in San Jose, California

June 13–15, 2017

Tracks

Tracks are divided into eight key topic areas, which will cover:

Apache Hadoop

Apache Hadoop

Apache Hadoop continues to drive innovation at a rapid pace, and the next generation of Hadoop is being built today. This track showcases new developments in core Hadoop and closely related technologies. Attendees will hear about key projects, such as HDFS and YARN, projects in incubation, and the industry initiatives driving innovation in and around the Hadoop platform. Attendees will interact with technical leads, committers, and expert users who are actively driving the roadmaps, key features, and advanced technology research around what is coming next for the Apache Hadoop.

Cloud and Operations

Cloud and Operations

For a system to be “open for business,” system administrators must be able to efficiently manage and operate it. That requires a comprehensive dataflow and operations strategy. This track provides best practices for deploying and operating data lakes, streaming systems, and the extended Apache data ecosystem on premises and in the cloud. Sessions cover the full deployment lifecycle including installation, configuration, initial production deployment, upgrading, patching, loading, moving, backup, and recovery.

You’ll discover how to get started and how to operate your cluster. Speakers will show how to set up and manage high-availability configurations and how DevOps practices can help speed solutions into production. They’ll explain how to manage data across the edge, the data center, and the cloud. And they’ll offer cutting-edge best practices for large-scale deployments.

Sample technologies:
Apache Ambari, Cloudbreak, HDInsight, HDCloud, Data Plane Service, AWS, Azure, and Apache Oozie

Governance and Security

Governance and Security

Your data lake contains a growing volume of diverse enterprise data, so a breach could be catastrophic. Privacy violations and regulatory infractions can damage your corporate image and long-term shareholder value. Government and industry regulations demand you properly secure and govern your data to assure compliance and mitigate risks. But as Hadoop and streaming applications emerge as a critical foundation of a modern data architecture, enterprises face new requirements for protection and governance.

In this track, you’ll learn about the key enterprise requirements for governance and security of the extended data plane. You’ll hear best practices, tips, tricks, and war stories on how to secure and govern your big data infrastructure.

Sample technologies:
Apache Ranger, Apache Sentry, Apache Atlas, and Apache Knox

Apache Spark and Data Science

Apache Spark and Data Science

Artificial Intelligence is transforming every vertical. Several popular tools and projects have contributed to this accelerated transformation: Apache Spark for large-scale Machine Learning, TensorFlow for Deep Learning, and Apache Zeppelin and Jupyter notebooks enabling Data Scientist to quickly prototype, test, and deploy advanced ML models. This track covers introductory to advanced sessions on algorithms, tools, applications, and emerging research topics that extend the Hadoop ecosystem for data science. Sessions will include examples of innovative analytics applications and systems, data visualization, statistics, and machine learning, deep learning and artificial intelligence. You will hear from leading data scientists, analysts and practitioners who are driving innovation by extracting valuable insights from data at rest as well as data in motion.

Artificial Intelligence is transforming every vertical. Several popular tools and projects have contributed to this accelerated transformation: Apache Spark for large-scale Machine Learning, TensorFlow for Deep Learning, and Apache Zeppelin and Jupyter notebooks enabling Data Scientist to quickly prototype, test, and deploy advanced ML models. This track covers introductory to advanced sessions on algorithms, tools, applications, and emerging research topics that extend the Hadoop ecosystem for data science. Sessions will include examples of innovative analytics applications and systems, data visualization, statistics, and machine learning, deep learning and artificial intelligence. You will hear from leading data scientists, analysts and practitioners who are driving innovation by extracting valuable insights from data at rest as well as data in motion.

Data Processing and Warehousing

Data Processing and Warehousing

Apache Hadoop – YARN has transformed Hadoop into a multi-tenant data platform. It is the foundation for a wide range of processing engines that empowers businesses to interact with the same data in multiple ways simultaneously. This means applications can interact with the data in the most appropriate way: from batch to interactive SQL or low latency access with NoSQL, and the interaction of legacy data stores and big data. There is a vast ecosystem of SQL engines and tools that are enabling richer Data Warehousing on Hadoop with capabilities for ACID, interactive queries, OLAP and data transformation. You will have the opportunity to hear from the rock stars of the Apache community and learn how these innovators are building applications.

IoT and Streaming

IoT and Streaming

The rapid proliferation of sensors and connected devices is fueling an explosion in data. Streaming data allows algorithms to dynamically adapt to new patterns in data, which is critical in applications like fraud detection and stock price prediction. Deploying real-time machine learning models in data streams enables insights and interactions not previously possible.

In this track you’ll learn how to apply machine learning to capture perishable insights from streaming data sources and how to manage devices at the “jagged edge.” Sessions present new strategies and best practices for data ingestion and analysis. Presenters will show how to use these technologies to develop IoT solutions and how to combine historical with streaming data to build dynamic, evolving, real-time predictive systems for actionable insights.

Sample technologies:
Apache Nifi, Apache Storm, Streaming Analytics Manager, Apache Flink, Apache Spark Streaming, Apache Beam, Apache Pulsar and Apache Kafka

Applications

Applications

In this track you will hear from ISVs, and architects that have created applications, frameworks, and solutions that have been built to solve real business problems leveraging data as an asset. These Modern Data Applications are augmenting traditional architectures and extending the reach for insights from the edge to the data center. Sessions in this track span both technical and business audiences discussing business justification and ROI to technical architecture.

Enterprise Adoption

Enterprise Adoption

Enterprise business leaders and innovators are using data to transform their businesses. These modern data applications are augmenting traditional architectures and extending the reach for insights from the edge to the data center. Sessions in this track will discuss business justification and ROI for modern data architectures.

You’ll hear from ISVs and architects who have created applications, frameworks, and solutions that leverage data as an asset to solve real business problems. Speakers from companies and organizations across industries and geographies will describe their data architectures, the business benefits they’ve experienced, their challenges, secrets to their successes, use cases, and the hard-fought lessons learned in their journeys.

Pre Event Training

This 2 day course is designed for ‘Data Stewards’ or ‘Data Flow Managers’ who are looking forward to
automate the flow of data between systems.

TARGET AUDIENCE

Data Engineers, Integration Engineers and Architects who are looking forward to automate Data flow between systems.

PREREQUISITES

It is recommended that participants have some experience with Linux and a basic understanding of DataFlow tools. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.

View Brochure

Learn Data Science techniques and best practices leveraging the Hadoop ecosystem and tools in this 2 day course.

TARGET AUDIENCE

Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Apache Hadoop.

PREREQUISITES

Students must have experience with at least one programming such as Python, or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles.

View Brochure

This 2 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark. Topics include: Essential understanding of HDP & its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core, Spark SQL, Apache Zeppelin, and additional Spark features

TARGET AUDIENCE

Developers and data engineers who need to understand and develop applications on HDP

PREREQUISITES

Students should be familiar with programming principles and have experience in software development. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.

View Brochure

This 2 day course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. The focus will be on utilizing the Spark API from Python.

TARGET AUDIENCE

Developers, Architects, and Admins who would like to learn more about developing data applications in Spark, how it will affect their environment, and ways to optimize application.

PREREQUISITES

No previous Hadoop knowledge is required, though will be useful. Basic knowledge of Python or Scala is required. Previous exposure to SQL is helpful, but not required. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.

View Brochure

This 1 day course details the business value for, and provides a technical overview of, Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course serves as an optional primer for those who plan to attend a hands-on, instructor-led course.

TARGET AUDIENCE

Data architects, data integration architects, managers, C-level executives, decision makers, technical infrastructure team, and Hadoop administrators or developers who want to understand the fundamentals of Big Data and the Hadoop ecosystem.

PREREQUISITES

No previous Hadoop or programming knowledge is required. Students are encouraged to bring their wi-fi enabled laptop pre-loaded with the Hortonworks Sandbox should they want to duplicate demonstrations on their own machine.

View Brochure

This 2 day course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. The focus will be on utilizing the Spark API from Scala.

TARGET AUDIENCE

Developers, Architects, and Admins who would like to learn more about developing data applications in Spark, how it will affect their environment, and ways to optimize application.

PREREQUISITES

No previous Hadoop knowledge is required, though will be useful. Basic knowledge of Python or Scala is required. Previous exposure to SQL is helpful, but not required. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.

This 2 day course is designed for administrators who will be managing the Hortonworks Data Platform (HDP) 2.5 with Ambari. It Covers installation, configuration, and other typical cluster management tasks.

TARGET AUDIENCE

IT administrators and operators responsible for installing, configuring, and supporting an HDP 2.5 deployment in a Linux environment using Ambari.

PREREQUISITES

No previous Hadoop knowledge is required, though will be useful. Attendees should be familiar with data center operations and Linux system administration. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.

View Brochure

This 2 day course is designed for system administrators and operators who need to manage secure HDP clusters. They will learn how to implement Kerberos, Apache Ranger, Apache Ambari, Apache Knox, SPNEGO, and other security concepts and tools to secure HDP clusters.

TARGET AUDIENCE

Systems administrators, operators, and security engineers that need to understand how to implement HDP security.

PREREQUISITES

Students must be familiar with distributed systems, basic networking, basic Linux commands. Prior Hadoop knowledge is helpful.

View Brochure

This 2 day course is designed for developers who need to create real-time applications to ingest and process streaming data sources using Hortonworks Data Platform (HDP). Specific technologies covered includes: Apache Hadoop, Apache Kafka, Apache Storm & Trident, Apache Spark and Apache HBase. The highlight of the course is the custom workshop-styled labs that will allow participants to build complete streaming applications with Storm and Spark Streaming.

TARGET AUDIENCE

Developers and data engineers who need to understand and develop real-time and streaming applications on HDP.

PREREQUISITES

Students should be familiar with programming principles and have experience in software development. Java programming experience is required. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.

View Brochure

Sponsors

Venue & Travel

Location Icon
SAN JOSE MCENERY CONVENTION CENTER

San Jose McEnery Convention Center, West San Carlos Street, San Jose, CA, United States

1 (408) 295-9600

Visit Event Center Website

Hotel
SAN JOSE MARRIOTT

San Jose Marriott, South Market Street, San Jose, CA, United States

1 (408) 280-1300

Visit Hotel Website

SAN JOSE MCENERY CONVENTION CENTER

San Jose McEnery Convention Center, West San Carlos Street, San Jose, CA, United States

View on Google Maps
SAN JOSE MARRIOTT

San Jose Marriott, South Market Street, San Jose, CA, United States

View on Google Maps

NEVER MISS ANOTHER SPEAKER ANNOUNCEMENT

Get Social, Stay Connected!