Securing and governing a multi-tenant data lake within the financial industry

Securing and governing a multi-tenant data lake within the financial industry

Thursday, April 19
11:50 AM - 12:30 PM
Room IV

Standard Bank South Africa is a Hortonworks client, with several multi-node clusters hosting Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF). This presentation will discuss the technical detail of implementing security, governance and multi-tenancy on a "Data Lake" within the finance industry. The talk will address the team's experiences, challenges, failures and learnings that we took away from this behemoth of an adventure.

After introducing Standard Bank and the Hadoop admin team, the presentation will describe the security and governance journey Standard Bank has undergone since the project's inception in 2015, as well as the roadmap for the future ahead.

Presentation structure:
1. Team introduction with background information
2. Environment overview (Where we are - Current)
-----Security
---------Authentication through Kerberos and LDAP/ AD
---------Authorization through Ranger and Centrify
---------Transparent Data Encryption (TDE) at rest
-----Governance
---------Centralized auditing
---------Ranger policies and data steward ownership
-----Multi-Tenancy
---------Data lake Vs. data analytics platform
---------Edge nodes Vs. API framework through Knox
3. How did we get to this stage? (Past)
-----Challenges faced (Kerberos, AD integration, SSL)
-----How we overcame these challenges
4. Future challenges we foresee (Future)
-----How we are planning to prepare for them

Presentation Video

SPEAKERS

Ian Pillay
Hadoop Administrator
Standard Bank
Ian Pillay (25yrs) is a Hadoop Administrator at Standard Bank in South Africa. He is an enthusiastic Computer Science graduate with a keen interest for all things Technology, whether it be Hardware, Software and the Security surrounding it. He is passionate about new technology and the open source landscape, and loves to learn new things. He has set up Hadoop clusters using Ambari and Apache’s Flavour, as well as implemented Kerberos and AD/ LDAP integration from OS up including SSL, and SSSD-like solutions. He also has experience in MySQL cluster administration although at an intermediate level, and naturally a decent level of Linux (SuSE, CentOS, Ubuntu) administration. He does not know everything, but what he does, he will share, and what he does not, he will learn from others. All of his experiences are not without issues. He is fairly experienced when it comes to problem solving, and solution tackling and while he might not program for a ‘living’, he does get his fix of it as explained below. (Languages include: Java, C#, Java Script, HTML5, CSS, SQL – Learning Python) --Non-Work Related -- In his spare time, he envelops himself in game development (includes programming, 3d modeling, graphic design, story production, marketing and a host of other Game development necessities) from start to finish using open source technology (Blender3D, Unity). He has published games to Google’s PlayStore and in the near future – Steam Corporations Store. He also is an ex-professional gamer, having represented his country on an international level for e-sports at the Interactive E-Sports Federation World Cup (IeSF) in South Korea.
Brad Smith
Hadoop Administrator
Standard Bank South Africa
Bradley Smith is an information scientist specializing in big data. Graduating university with Honours in 2015, he joined Standard Bank South Africa as a distributed technologies administrator and has been integral to the development of multiple key initiatives. Accomplishments include the design and deployment of HDP as a "data lake", the introduction of automation frameworks and the development of countless other pilot projects from "A" (Ansible) to "Z" (Zabbix). Bradley attained the title HDP Certified Administrator from Hortonworks in 2016. For the past year, Bradley has focused on automation to enable security, governance and multi-tenancy. The prime directive has been to provide and support a distributed compute platform focused on flexibility, allowing data scientists to execute effectively and efficiently - without introducing risk. Going forward, Bradley is exploring opportunities in enterprise architecture and cloud computing.