Data Offload for the Chief Data Officer – how to move data onto Hadoop without writing any Hadoop code

Data Offload for the Chief Data Officer – how to move data onto Hadoop without writing any Hadoop code

Wednesday, April 18
11:00 AM - 11:40 AM
Room III

“The CDO bears responsibility for the firm’s data and information strategy, governance, control, policy development, and exploitation of data assets to create business value” Source – Gartner

In this session we will show how the CDO can manage and exploit all of a company’s data assets on Hadoop, in a controlled manner where the quality of the data is verified and security access is controlled, and all data activities are logged and recorded automatically on Atlas. We will demonstrate how to use Bluemetrix Data Manager (BDM) and show how easy it is to ingest, transform and control data on Hadoop, while automatically deploying governance on Atlas.

BDM has been developed to allow any non-technical person to ingest and transform data on a Hadoop cluster, without having any knowledge of the underlying Hadoop environment and modules. It applies automation to a range of different tasks so that the necessary code and commands are created and deployed as required. It has been developed to run on top of the Control-M platform from our partner BMC, and has the following modules:

Data Ingest
 Simple template-based system for all data sources
 New data sources can be deployed in hours rather than weeks or months
 No extra code is developed, reducing the code release cycle time and complexity
 All standard ingest platforms support i.e. Sqoop, Ni-Fi, Kafka, etc.

Data Translation
 All source schemas are translated into Hadoop compatible schema’s
 Control characters removed and changed as appropriate
 Data cleansed, formatted and factored

Data Transformation
 Data transformations are coded and stored in a custom library deployed in Spark
 Data maps/flows can be created using a drag and drop interface
 Dramatic reduction in code developed and deployed
 Dramatic reduction in scripts developed
 No requirement for SQL skills or HIVE knowledge to transform the data
 No requirement for Spark expertise to create transformations

Data Governance & Lineage
 All data governance capabilities – Audit, Change Tracking, Data Snapshots, etc. – are built into the product
 Governance functionality can be easily customized to add new data and features i.e. GDPR compliance, addition of meta data, etc.
 Process is completely transparent to the end user – there is no requirement on behalf of the end user to possess any knowledge of Atlas or Navigator

Data Quality & Validation & Masking
 Apply checksums and other controls on the data as it moves through the cluster
 Validate the consistency and the integrity of the data
 Mask the data as it is ingested to protect PII data
 Access all quality data through a dashboard which will provide a CDO with a snapshot of the health of the data on their cluster

This session will demonstrate two key outcomes:
• How Hadoop can solve a CDO’s data governance and control issues
• How any non-technical person can ingest and process data on Hadoop in a controlled manner with full Governance and GDPR compliance implemented

Presentation Video


Liam English
Liam began his career as a software development engineer with Digital, Japan, in 1984 and worked with IDA/ Forbairt/Enterprise Ireland in Japan from 1986 to 1996. From 1996 to 2000 he founded and ran Biasia Ltd., a successful technology trading company based in Tokyo before founding Bluemetrix in 2001 to enter the emerging Web Analytics market. He holds a 1st Class Honours degree in Computer Science from UCC, Cork and an Honours MBA in International Business from Jochi University, Tokyo. He writes about Hadoop automation and security at https://www.