GDPR compliance application architecture and implementation using Hadoop and Streaming

GDPR compliance application architecture and implementation using Hadoop and Streaming

Tuesday, June 19
4:50 PM - 5:30 PM
Meeting Room 230C

The General Data Protection Regulation (GDPR) is a legislation designed to protect personal data of European Union citizens and residents. The main requirement is to log personal data accesses/changes in customer-specific applications. These logs can then be audited by owning entities to provide reporting to end users indicating usage of their personal data. Users have the "right to be forgotten,”meaning their personal data can be purged from the system at their request. The regulation goes into effect on May 25,2018 with significant fines for non-compliance.

This session will provide insight on how to approach/implement a GDPR compliance solution using Hadoop and Streaming for any enterprise with heavy volumes of data.This session will delve into deployment strategies, architecture of choice (Kafka,NiFi. and Hive ACID with streaming), implementation best practices, configurations, and security requirements. Hortonworks Professional Services System Architects helped the customer on ground to design, implement, and deploy this application in production.

SPEAKERS

Saurabh Mishra
Systems Architect
Hortonworks Inc
Saurabh is a Systems Architect with strong expertise in Hadoop ecosystem and rich field experience. He helps large to small enterprises solve their business problems strategically, functionally and at scale by leveraging Bigdata technologies. He is equipped with hands-on experience building, coding and directing successful information technology initiatives. Saurabh has over 14 years of strong IT experience and has served in key positions as Lead Big Data Solution Architect, Performance Architect, Technology Architect in multiple large and complex enterprise programs. He has extensive knowledge of BigData/NoSql technologies including Hadoop, Yarn, Spark, Hbase, Hive, Pig, Storm, Kafka, Nifi etc. and has been working in this space for last 6+ years. Saurabh has architected and designed big data platforms and applications that consist of 1000s of nodes , 10s of Petabytes of data and Complex ETL workflows requirements. Saurabh have provided solutions for GDPR requirement, Just-In-Time analytics, leveraging co-located datasets at scale to provide insight and pattern detection. Building data pipelines that produce results in minutes or hours across peta-bytes of data. Building and discovering new ways to co-locate, integrate and leverage disparate datasets using the Lambda and HTAP BigData architecture and IOT applications.
Arun Thangamani
Systems Architect
Hortonworks Inc
Arun is a Distributed Systems Architect from Hortonworks who has Authored/Implemented/ Optimized Big-Data Pipelines for many Fortune 500 firms like Apple, Microsoft, SAP, ADP, Fidelity, Expedia, Monsanto, TMobile etc. Enterprise Big Data Pipelines typically solve Analytics/Warehousing Challenges and comprise of Streaming/Batch/Hybrid processing paradigms. Comprehending the nuances of these Peta-Byte through-put pipelines, analyzing their data flow efficiencies needs in-depth architectural understanding of utilized frameworks/components within pipeline applications. Arun’s unique ability to visualize pipeline dataflows horizontally (data partitioning, shuffling, combining, at a distributed framework level) and vertically (application request, cache, page-cache, disk layer) helps design/ untangle/unclog/optimize pipelines very efficiently. Distributed Datasets, Map Reduce based Directed Acyclic Graph solutions, Distributed Stream Processing, Column Oriented storage, Column Family Oriented Stores, Dynamic Partitioned Rings, Machine Learning Algorithms and TFD based Spark/Hadoop ETL workflows are some of the topics within Arun’s area of interest/expertise. Prior to Hortonworks, Arun spent about a decade with Monsanto, working on biotech research in Hadoop as well as various custom in-house distributed parallel processing/storage platforms. Other previous experience includes stints at ADP, Caliper and Yahoo and research work for NASA. Arun’s expertise includes SOA, enterprise security, distributed computing/caching, and scalable read/write behind caching architectures. Arun holds a master’s degree in computer science from University of Alabama and a bachelor’s degree in engineering from Sri Venkateswara College of Engineering, India.