This talk talks through learning from the HDP implementation at G-Research, a leading Fin-Tech company based in London.
The team at G-Research implemented the Hortonworks Data Platform to build a data lake and
enable the business team to build analytics and machine learning tools. The team faced challenges
to accurately control and manage any sensitive data. Business teams were not able to search
through data due to lack of data classification.
G-Research implemented Privacera auto-discovery solution to precisely discover and tag data
as it is ingested into the HDP environment. The tags are pushed to Apache Atlas and then
Apache Ranger for enabling tag based policies. The G-Research team also build custom tools to push Spark lineage
information into Atlas. Finally, Privacera monitoring tools continuously analyzed access audit information to
alert if sensitive data is moved to folders that might not be protected.
Consequently, security team got real visibility into the sensitive data. Also, business users could
search and find the data within appropriate data classification in place.