A Birds of a Feather(BOF) is an informal discussion group. DataWorks will sponsor several Birds of Feather (BoFs) meeting groups, hosted by Apache Committers, architects, tech-leads, and engineers.  Attendees group together based on a shared interest and carry out discussions without any pre-planned agenda. These groups will have hosts that will moderate the discussion.

Come to join the discussion and share your experiences, challenges, future interests, and requirements on key Apache and other open source projects and discuss what’s on the roadmap and future design options.

Date: Wednesday, March 20
Room: Check agenda or check the DataWorks Summit Mobile App

Data Engineering & Data Science

Come learn and discuss the latest innovations and future direction in Apache Spark, Apache Zeppelin, and other ecosystem tools for Data Engineering and Data Science.

Host(s): Robert Hryniewicz, Justin Norman, Alice Albrecht

Location: Room 118-119

Apache Hadoop – HDFS and YARN

Apache Hadoop keeps evolving to meet the community demands around distributing computing and storage.  Apache Hadoop has just released 3.0 and quickly followed by 3.1 with key enhancements to YARN and HDFS.

Apache Hadoop HDFS is a distributed Java-based file system for storing large volumes of data. Come learn and discuss the latest HDFS and Ozone innovations and future directions.

Apache Hadoop YARN is the architectural center of Hadoop that allows multiple data processing engines to handle data stored in a single platform, unlocking an entirely new approach to analytics. Come learn and discuss the latest YARN innovations and future directions.



Host(s): Sunil Govindan, Márton Elek

Location: Room 124-125

Apache HBase & Apache Phoenix

Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds:

  • The power of standard SQL and JDBC APIs with full ACID transaction capabilities and the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store
  • Apache HBase™ is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware

Host(s): Artem Ervits, Clay Baenziger

Location: Room 120-121

Apache Hive & Apache Druid

Apache Hive & Apache Druid

Apache Hive is the de facto standard for SQL queries in Hadoop. With the next phase of SQL in Hadoop, the Apache community has greatly improved Hive’s speed (LLAP), scale and SQL semantics.  Come learn and discuss what is new in Hive 3.0.

Apache Druid is an open source column-oriented distributed data store designed for OLAP queries on event data. Druid provides the ability to have interactive queries on real-time streams that are horizontally scalable. Druid has rich client libraries and integration with tools like Pivot and Apache Superset. Come learn about the latest developments in Druid and Hive/Druid integration.

Host(s): Jason Dere, Nishant Bangarwa

Location: Room 127-128

Cloud & Operations

Cloud & Operations

Apache Ambari and Cloudbreak provide the foundation for Hadoop and Streaming platform installs, configurations and management on-premise and in the cloud. Come learn about the latest innovations and discuss Hadoop & Streaming platform operations and future directions.

Host(s): Peter Darvasi, Sandor Molnar

Location: Room 131-132

IoT, Streaming & Data Flow

Real-time data processing with Apache NiFi, Apache Kafka, Apache Storm, Apache Spark Streaming and many more provide the foundation for data processing in IoT. Come learn and discuss the latest streaming & data flow innovations and future directions.


Host(s): Andy LoPresto, Abdelkrim Hadjidj, Purnima Kuchikulla, DANIEL CHAFFELSON

Location: Room 129-130

Security & Governance

Apache Knox and Apache Ranger provide security across the big data ecosystem while Apache Atlas provides an open source framework for metadata and enterprise governance and Data Steward Studio provides an open source based stewardship experience for users. Come to learn, discuss, and share your experience and insights on the innovations in security & governance in the open source communities that can help in the age of regulations like GDPR, CCPA, various national privacy acts, and how such approaches can address compliance, industry regulations, and standards across various industries both currently and looking out in the future.

Host(s): Srikanth Venkat

Location: Room 122-123