HDP Essentials & Hive

This course introduces and demonstrates the components that make up the Hortonworks Data Platform (HDP) ecosystem. Apache Hive will then be explored at a more detailed level, including hands-on demos.

Description

This is a technical overview with hands on exercises of Apache Hadoop and Hive. It includes high-level information about concepts, architecture, operation, and uses of the HDP and the Hadoop ecosystem. A deeper focus will also be utilized for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Hive.

Topics

  • Explain the components within HDP’s Data Management, Data Access, Governance & Integration, Security, and Operations categories
  • Understand how Hive tables are defined and implemented
  • Use Hive to explore and analyze data sets
  • Use the Hive windowing functions
  • Use Hive to join datasets

Target Audience

Software developers, business and reporting analysts, and technical managers, who need to understand the capabilities and build applications for Hadoop.

Prerequisites

Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.

Big Data Science with HDP

This course will introduce the big data science workflow. Specifically discussed will be how to move from working with small datasets to working with big data using Spark, Hive, and Zeppelin.

Description

Big Data Science with HDP will cover all aspects of the data science workflow. Special focus will be given to transitioning from the single-machine Python scientific stack to the big data science stack of Hive, Spark, and Zeppelin.

Topics covered will include how to ingest, store, and munge data; data exploration and visualization; feature engineering and machine learning including supervised and unsupervised model building.

Topics

  • The data science workflow
  • Data exploration with Spark
  • Data visualization with Zeppelin
  • Data munging with Spark
  • Machine learning with Spark ML
  • Feature engineering
  • Performance tuning

About the Instructor

Alexander Combs is an full-time iSenior Training Engineer and author specializing in data science at Hortonworks based out of New York City. He previously worked as data scientist at Bloomberg, L.P., and was the lead instructor of the data science immersive program at General Assembly in NYC. He holds a M.A. and B.A. in economics with a focus in computational social science.

Target Audience

Developers, Analysts, and Data Scientists who are interested in learning how to use big data tools to do data science at scale.

Prerequisites

Students should be comfortable with programming principles, have prior experience/exposure to statistical and/or computational modeling concepts, and preferably experience with SQL. No prior Hadoop knowledge is required.