DISCOVER with Data Steward Studio: Understanding and unlocking the value of data in hybrid enterprise data lake environments

DISCOVER with Data Steward Studio: Understanding and unlocking the value of data in hybrid enterprise data lake environments

Tuesday, June 19
4:00 PM - 4:40 PM
Meeting Room 230C

With the convergence of cloud, IoT, and big data technologies, data lakes are becoming the critical fuel for enterprise-wide digital transformations. Enterprises increasingly have their data spread across multiple data lakes in many geographies and across multiple cloud platforms, for example, due to regulatory and compliance mandates that limit cross-border data transfer such as GDPR. With the proliferation of data types and sources in this complex landscape, the process of discovery, organization, and curation of data has become extremely expensive. Additionally, gaining global visibility into the business context, usage, and trustworthiness of data requires a centralized view of all data and metadata, security controls, data access, and monitoring. All of these challenges create a significant chasm between initial data capture and subsequent data insights generation to drive value creation. Providing adequate stewardship with the right set of rules and policies around data security and privacy as well as rational policy enforcement across the information supply chain is critical to adoption of modern data lake architectures and value creation. Therefore, enterprises now require a “global insight fabric” that can find a happy medium between adequate rules and policies of data governance while providing a trusted environment for users to collaborate and share data responsibly in order to create value. We recently launched 100% open source Hortonworks Data Steward Studio (DSS) service that can help enterprises address these challenges and move them closer to realizing the vision of a global insight fabric.

In this talk, we will outline how data stewards, analysts, and data engineers can better understand their data assets across multiple data lakes at scale using DISCOVER approach with DSS:
Detect: Find where important data assets are located
Inventory: Locate and catalog all data globally
Secure: Protect data assets and monitor their access and usage
Collaborate: Crowdsource and leverage knowledge across the enterprise
Organize: Curate and group data based on different characteristics
Verify: Understand sources and complete chain of custody for all data (lineage and impact)
Enrich: Add classifications and annotations
Report: Create and view multiple dashboards, reports, and summarizations of data

We will showcase how DSS empowers enterprises to precisely identify and evaluate trust levels of their data, to securely collaborate, and to confidently democratize data across the enterprise in order to derive value from the data in their data lakes – whether these data lakes are located in on-premise data centers or in the cloud or across multiple cloud provider environments.


Srikanth Venkat
Senior Director, Product Management
Srikanth Venkat is currently responsible for Security & Governance portfolio of products at Hortonworks which include Apache Knox, Apache Ranger, Apache Atlas, Platform wide security and Hortonworks DataPlane Service. Prior to Hortonworks, Srikanth has held multiple roles in areas of cloud services, marketplaces, security, and business applications. His experience includes leadership across Product Management, Strategy and Operations, and Technical Architecture with broad experience in startups to global organizations including Telefonica,, Cisco-Webex, Proofpoint, Dataguise, Trilogy Software, and Hewlett-Packard. Srikanth holds a PhD in Engineering with a focus on Artificial Intelligence from University of Pittsburgh, and an MBA in General Management from Indiana University and a Masters in Global Management from Thunderbird School of Global Management. Srikanth is a Data Sciences & Machine Learning hobbyist and enjoys tinkering with Big Data technologies.
Hemanth Yamijala
Principal Engineer
I am a Principal Engineer working at Hortonworks, focussed on governance and metadata management products. I lead the Hortonworks DataPlane Services platform and Hortonworks Data Steward Studio. Earlier, I was an active contributor and committer on Apache Atlas. I am interested in building scalable data processing systems and metadata management systems that operate in the Apache Hadoop ecosystem. I have been involved with Apache Hadoop since early days, and was a lead responsible for MapReduce before Hadoop graduated to become a 1.0 product.