Hortonworks is sponsoring a quick, hands-on introduction to key Apache projects. Come and listen to a short technical introduction and then get hands-on with your personal machine, ask questions, and leave with a working environment to continue your journey.

Data Science Crash Course

Introduction: This workshop will provide a hands-on introduction to Machine & Deep Learning.

Format: An introductory lecture on several supervised and unsupervised Machine Learning techniques followed by light introduction to Deep Learning. Both Apache Spark as well as TensorFlow will be introduced with relevant code samples that users can run in the cloud and explore.

Objective: To provide a quick and short hands-on introduction to Machine Learning with Spark Machine Learning library (MLlib) and Deep Learning with TensorFlow. In the lab, you will use the following components: Apache Zeppelin and Jupyter notebooks with Apache Spark and TensorFlow processing engines (respectively). You will learn how to analyze and structure data, train Machine Learning models and apply them to answer real-world questions. You will also learn how to select, train, and test Deep Learning models.

Prerequisites: Registrants must bring a laptop with a Chrome or Firefox web browser installed (with proxies disabled, i.e. must show venue IP to access cloud resources). These labs will be done in the cloud. At this Crash Course everyone will be assigned a cluster to try several workloads using Apache Spark and TensorFlow in Zeppelin and Jupyter notebooks (respectively) hosted in the cloud.

Speakers: Robert Hryniewicz

Location: Room 111

Data Science Crash Course Slides

Apache Nifi Crash Course

Introduction: This workshop will provide a hands on introduction to simple event data processing and data flow processing using a Sandbox on students’ personal machines.

Format: A short introductory lecture to Apache NiFi and computing used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.

Objective: To provide a quick and short hands-on introduction to Apache NiFi. In the lab, you will install and use Apache NiFi to collect, conduct and curate data-in-motion and data-at-rest with NiFi. You will learn how to connect and consume streaming sensor data, filter and transform the data and persist to multiple data sources.

Prerequisites: Registrants must bring a laptop that has the latest VirtualBox installed and an image for Hortonworks DataFlow (HDF) Sandbox will be provided. Participants should have the Sandbox downloaded and working before they arrive.

Speakers: Nathan Gough

Location: Room 111

GDPR/Security & Governance Crash Course

Introduction: This workshop will provide an overview of GDPR provisions along with relevant use cases.

Format: A short introductory lecture on GDPR. Then we will focus on the topics of consent, profiling and right to be forgotten or data erasure and how companies can establish processes for acquiring consent, automated data processing, data discovery and classification using technologies such as Apache Atlas, Apache Ranger and Apache Hive.

Objective: To provide a quick and hands-on introduction to GDPR concepts. In the lab you practice the concepts using Apache Hadoop, Atlas, Ranger and Hive to process and classify data.

Prerequisites: To participate in the hands on section, registrants must bring a laptop that can be used to access lab environment on AWS public cloud.

Speakers: Ali Bajwa, Srikanth Venkat

Location: Room 111

Kafka/SMM Crash Course

Introduction: This session will cover learning about fundamentals of Apache Kafka and the related SMM(Streams Messaging Manager).

Format: This session will start with understanding the basic concepts/entities of Apache Kafka like Brokers, Topics, Producers and Consumers/Consumer Groups. It will then delve deeper in to advanced topics like idempotent producer, transactional API in Kafka for exactly once processing, authentication, authorization, replication, log compaction, compression, performance,  etc. It will later on be followed by a demo of SMM, an open source Hortonworks initiative to help users of Kafka get a better insight in to their Kafka clusters from an operational perspective using an elegant and slick GUI rather than writing complex manual scripts. It will also cover a demo of Alerting/Notification framework that can be used to trigger alerts and notify based on certain conditions one wants to monitor for.

Objective: The objective of this session is to learn about Apache Kafka and illustrate how SMM can help to answer questions that arise in production deployments. Example questions are “Do I have any offline topic partitions”, “Which consumer group is falling behind most”, “What producers are generating the most data right now”, “How does data in my application topic look like” and so on. It will also aim to get familiar with SMM GUI exploring different views around different entities like Brokers, Topics, Producers and Consumer Groups so that user can quickly look for valuable information needed to monitor Kafka clusters or their application. It will also aim to learn how to use the Alerting and Notification framework that comes with SMM to automate monitoring of Kafka clusters and the applications built around it.

Prerequisites: To take full advantage of the crash course, registrants must bring their own laptops for a hand-on experience.

Speakers: Purnima Kuchikulla, DANIEL CHAFFELSON

Location: Room 111