Taming an Ungoverned Legacy Data Lake and Extending Tag-based Authorization in a Heterogeneous Data Discovery Environment

Taming an Ungoverned Legacy Data Lake and Extending Tag-based Authorization in a Heterogeneous Data Discovery Environment

Wednesday, May 22
2:00 PM - 2:40 PM
Marquis Salon 9

Comcast’s Streaming Data platform comprises ingest, transformation, and storage services in the public cloud, and on-prem RDBMS’s, EDW’s, and a large, ungoverned legacy data lake. We use Apache Atlas for data discovery and lineage, relying heavily on its unique-to-the-industry extensibility. First we tackled the public cloud, including kafka topics, avro schemas and S3 datasets. Next we integrated metadata and lineage for the on-prem datasets. More recently we added data-based ML approaches to duplicate elimination and discovery of semantic equivalences. These are aimed primarily at taming the chaos of the legacy data lake, and finding connections between that data lake and the EDW. We use Atlas/Ranger for tag-based authorization not only in the Hadoop environment, but also in AWS S3, Presto, and other public cloud-based applications. We have built API’s to make it very easy for other groups within Comcast to push metadata and lineage to Atlas, removing our group as the bottleneck. All the extensions to Atlas type definitions have been contributed to the Apache Open Source community.

SPEAKERS

Barbara Eckman
Principal Software Architect
Comcast
Barbara Eckman is a Principal Software Architect at Comcast. She leads data governance for an innovative, division-wide initiative comprising near-real-time ingesting, streaming, transforming, storing, and analyzing Big Data. Barbara is a recognized technical innovator in Big Data architecture and governance, as well as scientific data and model integration. Her experience includes technical leadership positions at a Human Genome Project Center, Merck, GlaxoSmithKline, and IBM. She served on the IBM Academy of Technology, an internal peer-elected organization akin to the National Academy of Sciences.