Data proliferation from 7+ billion humans and 20+ billion devices from every walk of life has been the focus in the last decade. With the velocity, variety and volume of data, every data organization’s goal shifted to protecting and monetizing data from rapidly growing network of IOT embedded objects and sensors.
One of the true and tried business continuity methodology of storing and retrieving vast amount of data has been through replication of Hadoop systems on hybrid clouds and in geographically distributed data centers. Replication is similar to Blockchain using autonomous smart contracts instantiated on the metadata and data so that the replicated data follows a single source of truth.
Replicas can be maintained across geographically distributed data centers giving greater risk tolerance capabilities to the businesses continuity plan for the data-sets. With intelligent predictive analytics based on usage patterns, dynamic tiering policies can be triggered on the data sets to provide true value-add to the data. The temperature of the data is used to move data between hot/warm/cold/archival storage based on configurable policies leading to greater reduction in total cost of ownership.
Users in 2018 and beyond demand absolute availability of data as and when they desire. The dynamic data access management is fundamental concept to satisfy the business continuity plan. Seamless enterprise-grade disaster recovery to support business continuity use case has significant challenges around replicating security and governance on data-sets. In this talk we will discuss how the above challenge can be addressed for supporting seamless replication and disaster recovery for Hadoop-scale data.