Pre Event Training
Pre Event Training
To filter the agenda to your interests, please select at least one item from each column and click Apply.
DataWorks Summit/Hadoop Summit will feature three days of content and eight tracks dedicated to enabling modern data applications. Attendees will develop an understanding of key technologies powering new modern data applications and the value they generate for businesses. Industry experts, business leaders, architects, data scientists, Apache Hadoop developers, and Apache Committers will share use cases, success stories, best practices, cautionary tales, and technology insights that will provide practical guidance to novices as well as experienced practitioners of modern data infrastructure.
CLOUD AND APPLICATIONS
In this track you will hear from ISVs, and architects that have created applications, frameworks, and solutions that have been built to solve real business problems leveraging data as an asset. These Modern Data Applications are augmenting traditional architectures and extending the reach for insights from the edge to the data center. Sessions in this track span both technical and business audiences, discussing business justification and ROI to technical architecture.
For a system to be “open for business”, it must be efficiently managed by system administrators. A critical component of a successful connected data architecture is a comprehensive dataflow and operations strategy. The track is focused on developing and deploying Modern Data Applications on the extended Apache Data ecosystem in the on-premise and cloud. Sessions will range from how to get started, and operating your cluster to cutting-edge best practices for large-scale deployments.
Sample Technologies: Cloudbreak, HDInsight, HDCloud, AWS, Azure, Apache Hadoop, Apache Spark, Apache Storm and Apache NiFi among others.
In this track you will learn from enterprise business leaders and innovators about how they have used data to transform their business. Sessions cover architecture, business benefits, challenges, and secrets to success around these transformations. Speakers are from different companies across industries and geographies, but they have one thing in common: they are leveraging data and open source technology for amazing business outcomes. Sessions will cover ROI, business benefits, and success criteria, as well as hard-fought lessons learned in their journey.
DATA PROCESSING & WAREHOUSING
Apache Hadoop – YARN has transformed Hadoop into a multi-tenant data platform. It is the foundation for a wide range of processing engines that empowers businesses to interact with the same data in multiple ways simultaneously. This means applications can interact with the data in the most appropriate way: from batch to interactive SQL or low latency access with NoSQL, and the interaction of legacy data stores and big data. There is a vast ecosystem of SQL engines and tools that are enabling richer Data Warehousing on Hadoop with capabilities for ACID, interactive queries, OLAP and data transformation. You will have the opportunity to hear from the rock stars of the Apache community and learn how these innovators are building applications.
Sample Technologies: Apache Hive, Apache Tez, Apache ORC, Druid, Apache Parquet, Apache HBase, Apache Phoenix, Apache Accumulo, Apache Drill and Apache Impala among others.
Apache Hadoop continues to drive innovation at a rapid pace, and the next generation of Hadoop is being built today. This track showcases new developments in core Hadoop and closely related technologies. Attendees will hear about key projects, such as HDFS and YARN, projects in incubation, and the industry initiatives driving innovation in and around the Hadoop platform. Attendees will interact with technical leads, committers, and expert users who are actively driving the roadmaps, key features, and advanced technology research around what is coming next for the Apache Hadoop.
Sample Technologies: Apache Hadoop (YARN, HDFS).
OPERATIONS, GOVERNANCE AND SECURITY
With the growing volumes of diverse data being stored in the Data Lake, any breach of this enterprise-wide data can be catastrophic, from privacy violations and regulatory infractions to corporate image and long-term shareholder value. This track focuses on the key enterprise requirements for governance and security for the extended data plane. As Hadoop and streaming applications emerges as a critical foundation of a modern data application, the enterprise has placed stringent requirements on it for these key areas. Speakers will present best practices with an emphasis on tips, tricks, and war stories on how to secure your big data infrastructure.
Sessions will also cover full deployment lifecycle for on-premise and cloud deployments, including installation, configuration, initial production deployment, recovery, security, and data governance for Hadoop. This track covers the core practices and patterns for planning, deploying, loading, moving, backup/recovery, HA and managing data across edge, on-premise and cloud. The track is focused on deploying and operating Hadoop and the extended Apache Data ecosystem in the on-premise and cloud.
Sample Technologies: Apache Ambari, Apache Ranger, Apache Atlas and Apache Knox among others.
IOT AND STREAMING
The increase in the number of sensors and connected devices is fueling data growth and the opportunity to leverage streaming data for new insights and interactions. The speed with which enterprises can make decisions based on data is critical to their competitive advantage. This track covers the state of the art in obtaining perishable insights from streaming data sources, including managing devices at the “jagged edge”, strategies and practices for data ingestion and analysis, and best practices for deriving real-time actionable insights as the data flows from connected devices into Hadoop infrastructure. Attendees will hear from the technical leads, committers, and expert users who are actively driving the roadmaps and key features in IoT emerging technologies. Attendees will also learn how to use these technologies to develop IoT solutions.
Sample Technologies: Apache Nifi, Apache Storm, Apache Flink, Apache Spark and Apache Kafka among others.
APACHE SPARK & DATA SCIENCE
Artificial Intelligence is transforming every vertical. Several popular tools and projects have contributed to this accelerated transformation: Apache Spark for large-scale Machine Learning, TensorFlow for Deep Learning, and Apache Zeppelin and Jupyter notebooks enabling Data Scientist to quickly prototype, test, and deploy advanced ML models.
This track covers introductory to advanced sessions on algorithms, tools, applications, and emerging research topics that extend the Hadoop ecosystem for data science. Sessions will include examples of innovative analytics applications and systems, data visualization, statistics, and machine learning, deep learning and artificial intelligence. You will hear from leading data scientists, analysts and practitioners who are driving innovation by extracting valuable insights from data at rest as well as data in motion.
Sample Technologies: Apache Spark, R, Apache Zeppelin, Jupyter, TensorFlow, DeepLearning4J, Theano, CaffeOnSpark and Torch among others.
Hortonworks is sponsoring a quick, hands-on introduction to key Apache projects. Come and listen to a short technical introduction and then get hands-on with your personal machine, ask questions, and leave with a working environment to continue your journey.
Hortonworks will sponsor several Birds of Feather (BoFs) sessions, hosted by Apache Committers, Hortonworks will sponsor several Birds of Feather (BoFs) sessions, hosted by Apache Committers, Hortonworks’ architects, tech-leads, and engineers. Come and share your experiences, challenges, future interests, and requirements on key Apache projects and discuss what’s on the roadmap and future design options.
Meetups are a great way to connect with like-minded individuals face to face. Hortonworks and local community groups will host several Meetups the night prior to Summit. Join us for a social and networking hour, a presentation, Q&A with the presenter, and more socializing and networking to follow.