Cloud object stores are a critical part of Hadoop in cloud deployments, and a common destination of backups from on-premises clusters. Object stores matter. Yet they aren't classic filesystems, they behave differently —and to use them safely, you need to know the details.
This talk covers those details, showing how to safely work with cloud storage from Hadoop. Hive and Spark, as well as the general workflow of preparing, managing and publishing data..
Topics covered will include
What object stores can Hadoop work with, and how do they differ?
Enhancements in Azure storage.
Using S3Guard to give Hive a consistent view of S3.
Using object storage for backups.
Using Amazon S3 as a direct destination of Spark queries.
Where the metaphor breaks down: things to avoid.
Securing your data.
We'll show you what's ready for use, what's still to be treated with caution, and what we're up to next. With Demos. What more would you want?