Hortonworks is sponsoring a quick, hands-on introduction to key Apache projects. Come and listen to a short technical introduction and then get hands-on with your personal machine, ask questions, and leave with a working environment to continue your journey.

Data Science Crash Course

Introduction: This workshop will provide a hands-on introduction to Machine Learning (ML) with an overview of Deep Learning (DL).

Format: An introductory lecture on several supervised and unsupervised ML techniques followed by light introduction to DL and short discussion what is current state-of-the-art. Several python code samples using the scikit-learn library will be introduced that users will be able to run in the Cloudera Data Science Workbench (CDSW).

Objective: To provide a quick and short hands-on introduction to ML with python’s scikit-learn library. The environment in CDSW is interactive and the step-by-step guide will walk you through setting up your environment, to exploring datasets, training and evaluating models on popular datasets. By the end of the crash course, attendees will have a high-level understanding of popular ML algorithms and the current state of DL, what problems they can solve, and walk away with basic hands-on experience training and evaluating ML models.

Prerequisites: For the hands-on portion, registrants must bring a laptop with a Chrome or Firefox web browser. These labs will be done in the cloud, no installation needed. Everyone will be able to register and start using CDSW after the introductory lecture concludes (about 1hr in). Basic knowledge of python highly recommended.

Speakers: Robert Hryniewicz

Location: University of DC/Catholic University Room

Data Science Crash Course Video

Data Science Crash Course Slides

Kafka/SMM Crash Course

Introduction: This session will cover learning about fundamentals of Apache Kafka and SMM (Streams Messaging Manager)

Format: This session will start with understanding the basic concepts/entities of Apache Kafka like Brokers, Topics, Producers and Consumers/Consumer Groups. It will then delve deeper in to advanced topics like idempotent producer, transactional API in Kafka for exactly once processing, authentication, authorization, replication, log compaction, compression, performance, etc. It will later on be followed by a demo of SMM, an open source Cloudera initiative to help users of Kafka get a better insight in to their Kafka clusters from an operational perspective using an elegant and slick GUI rather than writing complex manual scripts. It will also cover a demo of Alerting/Notification framework that can be used to trigger alerts and notify based on certain conditions one wants to monitor for.

Objective: The objective of this session is to learn about Apache Kafka and illustrate how SMM can help to answer questions that arise in production deployments. Example questions are “Do I have any offline topic partitions”, “Which consumer group is falling behind most”, “What producers are generating the most data right now”, “How does data in my application topic look like” and so on. It will also aim to get familiar with SMM GUI exploring different views around different entities like Brokers, Topics, Producers and Consumer Groups so that user can quickly look for valuable information needed to monitor Kafka clusters or their application. It will also aim to learn how to use the Alerting and Notification framework that comes with SMM to automate monitoring of Kafka clusters and the applications built around it.


Speakers: Daniel Chaffelson

Location: University of DC/Catholic University Room

Kafka/SMM Crash Course Video

Kafka/SMM Crash Course Slides

Apache NiFi Crash Course

Introduction: This workshop will provide a hands on introduction to simple event data processing and data flow processing using a Sandbox on students’ personal machines.

Format: A short introductory lecture to Apache NiFi and computing used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.

Objective: This course will provide a short and quick hands-on introduction to Apache NiFi. In the lab, you will use Apache NiFi to collect, conduct and curate data-in-motion and data-at-rest with NiFi. You will learn how to connect and consume streaming sensor data, filter and transform the data and persist to multiple data sources.

Prerequisites: Registrants must bring a laptop that has the latest VirtualBox installed and an image for Hortonworks DataFlow (HDF) Sandbox. Participants should have the Sandbox downloaded and working before they arrive.

Speakers: Nathan Gough

Location: University of DC/Catholic University Room

Apache NiFi Crash Course Video

GDPR/Security/Governance Crash Course

Introduction: This workshop will provide an overview of GDPR provisions along with relevant use cases.

Format: A short introductory lecture on GDPR. Then we will focus on the topics of consent, profiling and right to be forgotten or data erasure and how companies can establish processes for acquiring consent, automated data processing, data discovery and classification using technologies such as Apache Atlas, Apache Ranger and Apache Hive.

Objective: To provide a quick and hands-on introduction to GDPR concepts. In the lab you practice the concepts using Apache Hadoop, Atlas, Ranger and Hive to process and classify data.

Prerequisites: To participate in the hands on section, registrants must bring a laptop that can be used to access lab environment on AWS public cloud.

Speakers: Srikanth Venkat, Eyad Garelnabi

Location: University of DC/Catholic University Room

GDPR/Security/Governance Crash Course Video