A Birds of a Feather(BOF) is an informal discussion group. DataWorks will sponsor several Birds of Feather (BoFs) meeting groups, hosted by Apache Committers, architects, tech-leads, and engineers.  Attendees group together based on a shared interest and carry out discussions without any pre-planned agenda. These groups will have hosts that will moderate the discussion.

Come to join the discussion and share your experiences, challenges, future interests, and requirements on key Apache and other open source projects and discuss what’s on the roadmap and future design options.

Date: Wednesday, May 22
Room: Check agenda or check the DataWorks Summit Mobile App

Apache Hadoop – HDFS and YARN

Apache Hadoop keeps evolving to meet the community demands around distributing computing and storage.  Apache Hadoop has just released 3.0 and quickly followed by 3.1 with key enhancements to YARN and HDFS.
Apache Hadoop HDFS is a distributed Java-based file system for storing large volumes of data. Come learn and discuss the latest HDFS and Ozone innovations and future directions.
Apache Hadoop YARN is the architectural center of Hadoop that allows multiple data processing engines to handle data stored in a single platform, unlocking an entirely new approach to analytics. Come learn and discuss the latest YARN innovations and future directions.

Host(s): Sunil Govindan, Weiwei Yang

Location: Marquis Salon 12

Apache Hive & Apache Druid

Apache Hive is the de facto standard for SQL queries in Hadoop. With the next phase of SQL in Hadoop, the Apache community has greatly improved Hive’s speed (LLAP), scale and SQL semantics.  Come learn and discuss what is new in Hive 3.0.

Apache Druid is an open source column-oriented distributed data store designed for OLAP queries on event data. Druid provides the ability to have interactive queries on real-time streams that are horizontally scalable. Druid has rich client libraries and integration with tools like Pivot and Apache Superset. Come learn about the latest developments in Druid and Hive/Druid integration.


Host(s): Owen O'Malley

Location: Marquis Salon 7

Machine Learning Operations

Come learn and discuss the latest innovations and future direction in Apache Spark, Apache Zeppelin, and other ecosystem tools for machine learning operations.


Host(s): Robert Hryniewicz, Justin Norman, Alice Albrecht

Location: Marquis Salon 10

Public Sector

Like-minded individuals within public sector will be discussing the latest trends and innovations in open source Big Data, advanced analytics, and data science technologies – how it is being applied in government agencies today, and where it can take mission-critical initiatives to enable transformative changes in the future. Bring your questions and share thoughts around Security and Governance, AI and Machine Learning, Data Engineering & Science, Cloud & Operations, IoT Streaming & Data Flow. Focus technologies include: Apache Spark, Apache Hadoop (YARN, HDFS, Ozone), AWS, Apache Ambari, Cloudbreak, Apache Hive, Apache Ranger, and Apache Nifi.


Host(s): Henry Sowell, Ian Brooks, Terry Padgett

Location: Marquis Salon 14

IoT, Streaming & Data Flow

Real-time data processing with Apache NiFi, Apache Kafka, Apache Storm, Apache Spark Streaming and many more provide the foundation for data processing in IoT. Come learn and discuss the latest streaming & data flow innovations and future directions.


Host(s): Timothy Spann, DANIEL CHAFFELSON, John Kuchmek

Location: Marquis Salon 8

Replication, Security, and Governance in the Hybrid Cloud Universe

Most enterprises are on an exciting journey to complement their on-prem environment with the cloud’s flexibility, scale, and agility for big data workloads and associated data. Data management in such hybrid environments is fraught with many challenges. As data is migrated or replicated between these environments it becomes challenging to manage and homogenize the data context across these environments throughout the replication and data movement processes. Once operating in such hybrid cloud environments, having consistent security and governance across all the environments so that the data can be managed seamlessly and uniformly poses even bigger challenges. Come to learn, discuss, and share your experience and insights on the how to navigate this hybrid enterprise data cloud journey with uniform security & governance and how the innovations in open source communities that can help enterprises accelerate this journey in the age of regulations like GDPR, CCPA and various industry regulations and standards, both currently and looking out in the future.


Host(s): Krishna Maheshwari, Srikanth Venkat, Madhan Neethiraj, Don Bosco Durai

Location: Marquis Salon 9