Cloud Storage: PUT is the new rename()

Thursday, April 19
11:50 AM - 12:30 PM
Room I

Cloud object stores are a critical part of Hadoop in cloud deployments, and a common destination of backups from on-premises clusters. Object stores matter. Yet they aren't classic filesystems, they behave differently —and to use them safely, you need to know the details.

This talk covers those details, showing how to safely work with cloud storage from Hadoop. Hive and Spark, as well as the general workflow of preparing, managing and publishing data..

Topics covered will include

What object stores can Hadoop work with, and how do they differ?
Enhancements in Azure storage.
Using S3Guard to give Hive a consistent view of S3.
Using object storage for backups.
Using Amazon S3 as a direct destination of Spark queries.
Where the metaphor breaks down: things to avoid.
Securing your data.

We'll show you what's ready for use, what's still to be treated with caution, and what we're up to next. With Demos. What more would you want?

Presentation Video

SPEAKERS

Steve Loughran
Member of Technical Staff
Hortonworks
Steve Loughran works on Hadoop att Hortonworks, currently cloud storage integration, including improving integration with Amazon's S3 in Hadoop, Hive and Spark He's the author of Ant in Action, a member of the Apache Software Foundation, and a committer on the Hadoop core since 2009. Prior to joining Hortonworks in 2012, he was a Research Scientist at HP Laboratories. He lives and works in Bristol, England.