How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Company with Strict IT Security and Diverse Analytics Use Cases

How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Company with Strict IT Security and Diverse Analytics Use Cases

Tuesday, June 19
2:00 PM - 2:40 PM
Meeting Room 230C

Successfully adopting a data analytics platform inside a large organization critically depends on integrating the platform within the technology fabric of the organization’s enterprise IT systems. Inside large organizations, enterprise security requirements, diverse analytics needs, and the exploratory nature of analytics can complicate adoption. To maximize success, we must make architectural and implementation choices that foster user flexibility, increase data connectedness, and respect enterprise security.

Inside Northrop Grumman, a global security company, our team leads our company’s enterprise-wide big data and analytics initiative. In the past several years, our platform technology group have developed, operated, and managed our company’s Hadoop based data analytics platform. We believe the key design principles of a successful platform include self-service, multitenant security, managed infrastructure, seamless connectivity of data, and seamless connectivity of tools. Keeping to these design principles, our on-premises enterprise analytics platform is built using Hadoop, business intelligence and visualization tools, and relational database tools. Northrop Grumman data science teams can onboard onto the platform with integrated authentication and automatic configuration of proper Hadoop and operating system authorizations, create managed ingest jobs that transfer big datasets, share data in a governed manner via an enterprise data catalog, provision big data and relational databases, run interactive and scheduled jobs, and publish production grade visualizations.

In this presentation, we present technology and architecture lessons learned during designing, building and operating Hadoop based enterprise data analytics platforms. We discuss critical tradeoffs when choosing an authentication strategy when integrating Hadoop with an existing IT environment, discuss practical implications of interfacing between authorization models, and how to achieve seamless connectivity of multiple COTS tools while maintaining self-service and multitenant security.


Leon Li
Software Architect
Northrop Grumman
Dr. Leon Li serves as Software Architect and designer of Northrop Grumman’s Hadoop based enterprise data analytics platform. He is an expert in Hadoop based enterprise system architectures, and advises Northrop Grumman executive leadership on analytics technologies. At Northrop Grumman, Leon previously served as Senior Software Engineer for a national cyber security information sharing program, led a university research effort in cryptography, and led systems engineering efforts on Cloud based big data systems for genomics research. Leon graduated with a PhD in Electrical Engineering from MIT.