DataWorks Summit in Sydney, Australia

September 20–21, 2017

Tracks

Tracks are divided into seven key topic areas, which will cover:

Apache Hadoop

Apache Hadoop

Apache Hadoop continues to drive innovation at a rapid pace, and the next generation of Hadoop is being built today. This track showcases new developments in core Hadoop and closely related technologies. Attendees will hear about key projects, such as HDFS and YARN, projects in incubation, and the industry initiatives driving innovation in and around the Hadoop platform. Attendees will interact with technical leads, committers, and expert users who are actively driving the roadmaps, key features, and advanced technology research around what is coming next for the Apache Hadoop.

Data Processing and Warehousing

Data Processing and Warehousing

Apache Hadoop – YARN has transformed Hadoop into a multi-tenant data platform. It is the foundation for a wide range of processing engines that empowers businesses to interact with the same data in multiple ways simultaneously. This means applications can interact with the data in the most appropriate way: from batch to interactive SQL or low latency access with NoSQL, and the interaction of legacy data stores and big data. There is a vast ecosystem of SQL engines and tools that are enabling richer Data Warehousing on Hadoop with capabilities for ACID, interactive queries, OLAP and data transformation. You will have the opportunity to hear from the rock stars of the Apache community and learn how these innovators are building applications.

Operations, Governance and Security

Operations, Governance and Security

With the growing volumes of diverse data being stored in the Data Lake, any breach of this enterprise-wide data can be catastrophic, from privacy violations and regulatory infractions to corporate image and long-term shareholder value. This track focuses on the key enterprise requirements for governance and security for the extended data plane. As Hadoop and streaming applications emerges as a critical foundation of a modern data application, the enterprise has placed stringent requirements on it for these key areas. Speakers will present best practices with an emphasis on tips, tricks, and war stories on how to secure your big data infrastructure. Sessions will also cover full deployment lifecycle for on-premise and cloud deployments, including installation, configuration, initial production deployment, recovery, security, and data governance for Hadoop. This track covers the core practices and patterns for planning, deploying, loading, moving, backup/recovery, HA and managing data across edge, on-premise and cloud. The track is focused on deploying and operating Hadoop and the extended Apache Data ecosystem in the on-premise and cloud.

Apache Spark and Data Science

Apache Spark and Data Science

Artificial Intelligence is transforming every vertical. Several popular tools and projects have contributed to this accelerated transformation: Apache Spark for large-scale Machine Learning, TensorFlow for Deep Learning, and Apache Zeppelin and Jupyter notebooks enabling Data Scientist to quickly prototype, test, and deploy advanced ML models. This track covers introductory to advanced sessions on algorithms, tools, applications, and emerging research topics that extend the Hadoop ecosystem for data science. Sessions will include examples of innovative analytics applications and systems, data visualization, statistics, and machine learning, deep learning and artificial intelligence. You will hear from leading data scientists, analysts and practitioners who are driving innovation by extracting valuable insights from data at rest as well as data in motion.

Artificial Intelligence is transforming every vertical. Several popular tools and projects have contributed to this accelerated transformation: Apache Spark for large-scale Machine Learning, TensorFlow for Deep Learning, and Apache Zeppelin and Jupyter notebooks enabling Data Scientist to quickly prototype, test, and deploy advanced ML models. This track covers introductory to advanced sessions on algorithms, tools, applications, and emerging research topics that extend the Hadoop ecosystem for data science. Sessions will include examples of innovative analytics applications and systems, data visualization, statistics, and machine learning, deep learning and artificial intelligence. You will hear from leading data scientists, analysts and practitioners who are driving innovation by extracting valuable insights from data at rest as well as data in motion.

Enterprise Adoption

Enterprise Adoption

Enterprise business leaders and innovators are using data to transform their businesses. These modern data applications are augmenting traditional architectures and extending the reach for insights from the edge to the data center. Sessions in this track will discuss business justification and ROI for modern data architectures.

You’ll hear from ISVs and architects who have created applications, frameworks, and solutions that leverage data as an asset to solve real business problems. Speakers from companies and organizations across industries and geographies will describe their data architectures, the business benefits they’ve experienced, their challenges, secrets to their successes, use cases, and the hard-fought lessons learned in their journeys.

Cloud and Applications

Cloud and Applications

In this track you will hear from ISVs, and architects that have created applications, frameworks, and solutions that have been built to solve real business problems leveraging data as an asset. These Modern Data Applications are augmenting traditional architectures and extending the reach for insights from the edge to the data center. Sessions in this track span both technical and business audiences, discussing business justification and ROI to technical architecture. For a system to be “open for business”, it must be efficiently managed by system administrators. A critical component of a successful connected data architecture is a comprehensive dataflow and operations strategy. The track is focused on developing and deploying Modern Data Applications on the extended Apache Data ecosystem in the on-premise and cloud. Sessions will range from how to get started, and operating your cluster to cutting-edge best practices for large-scale deployments.

IoT and Streaming

IoT and Streaming

The rapid proliferation of sensors and connected devices is fueling an explosion in data. Streaming data allows algorithms to dynamically adapt to new patterns in data, which is critical in applications like fraud detection and stock price prediction. Deploying real-time machine learning models in data streams enables insights and interactions not previously possible.

In this track you’ll learn how to apply machine learning to capture perishable insights from streaming data sources and how to manage devices at the “jagged edge.” Sessions present new strategies and best practices for data ingestion and analysis. Presenters will show how to use these technologies to develop IoT solutions and how to combine historical with streaming data to build dynamic, evolving, real-time predictive systems for actionable insights.

Sample technologies:
Apache Nifi, Apache Storm, Streaming Analytics Manager, Apache Flink, Apache Spark Streaming, Apache Beam, Apache Pulsar and Apache Kafka

Pre Event Training

This course introduces and demonstrates the components that make up the Hortonworks Data Platform (HDP) ecosystem. Apache Hive will then be explored at a more detailed level, including hands-on demos.

Description

This is a technical overview with hands on exercises of Apache Hadoop and Hive. It includes high-level information about concepts, architecture, operation, and uses of the HDP and the Hadoop ecosystem. A deeper focus will also be utilized for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Hive.

TARGET AUDIENCE

Software developers, business and reporting analysts, and technical managers, who need to understand the capabilities and build applications for Hadoop.

PREREQUISITES

Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.

This course will introduce the big data science workflow. Specifically discussed will be how to move from working with small datasets to working with big data using Spark, Hive, and Zeppelin.

Description

Big Data Science with HDP will cover all aspects of the data science workflow. Special focus will be given to transitioning from the single-machine Python scientific stack to the big data science stack of Hive, Spark, and Zeppelin.

Topics covered will include how to ingest, store, and munge data; data exploration and visualization; feature engineering and machine learning including supervised and unsupervised model building.

TARGET AUDIENCE

Developers, Analysts, and Data Scientists who are interested in learning how to use big data tools to do data science at scale.

PREREQUISITES

Students should be comfortable with programming principles, have prior experience/exposure to statistical and/or computational modeling concepts, and preferably experience with SQL. No prior Hadoop knowledge is required.

Sponsors

Venue & Travel

Location Icon
ICC SYDNEY CONVENTION CENTRE

ICC Sydney Convention Centre, Darling Drive, Sydney, New South Wales, Australia

+61 2 9215 7100

Visit Event Center Website

Hotel
NOVOTEL SYDNEY ON DARLING HARBOUR

Novotel Sydney on Darling Harbour, Murray Street, Pyrmont, New South Wales, Australia

+61 2 9288 7180

Visit Hotel Website

ICC SYDNEY CONVENTION CENTRE

ICC Sydney Convention Centre, Darling Drive, Sydney, New South Wales, Australia

View on Google Maps
NOVOTEL SYDNEY ON DARLING HARBOUR

Novotel Sydney on Darling Harbour, Murray Street, Pyrmont, New South Wales, Australia

View on Google Maps

NEVER MISS ANOTHER SPEAKER ANNOUNCEMENT

Get Social, Stay Connected!