A Birds of a Feather(BOF) is an informal discussion group. DataWorks will sponsor several Birds of Feather (BoFs) meeting groups, hosted by Apache Committers, architects, tech-leads, and engineers.  Attendees group together based on a shared interest and carry out discussions without any pre-planned agenda. These groups will have hosts that will moderate the discussion.

Come to join the discussion and share your experiences, challenges, future interests, and requirements on key Apache and other open source projects and discuss what’s on the roadmap and future design options.

Date: Wednesday, May 22
Room: Check agenda or check the DataWorks Summit Mobile App

Apache Hadoop – YARN, HDFS

Apache Hadoop keeps evolving to meet the community demands around distributing computing and storage.  Apache Hadoop 3.0 is actively in development in the community with key enhancements to YARN and HDFS.

Apache Hadoop YARN is the architectural center of Hadoop that allows multiple data processing engines to handle data stored in a single platform, unlocking an entirely new approach to analytics. Come learn and discuss the latest YARN innovations and future directions.

Apache Hadoop HDFS is a distributed Java-based file system for storing large volumes of data. Come learn and discuss the latest HDFS innovations and future directions.

Host(s): Sanjay Radia, Billie Rinaldi

Location: Room III

Apache Hive, Apache Hbase, Apache Phoenix & Druid

Apache Hive is the de facto standard for SQL queries in Hadoop. The next phase of the Stinger. next initiative, the Apache community has greatly improved Hive’s speed, scale and SQL semantics.  Come learn and discuss what is new in Hive 3.0 and Druid.

Apache HBase is the NoSQL store that runs on Apache Hadoop.  Apache Phoenix provides a SQL skin on top of HBase.

Come learn and discuss Hbase 2.0 along with the latest developments Phoenix.

Host(s): Alan Gates, Jesus Camacho Rodriguez, Nishant Bangarwa

Location: Convention Hall I - C

Apache Spark, Apache Zeppelin & Data Science

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets. Come learn and discuss Spark, Data Science, Deep Learning innovations and future directions.

Host(s): Yanbo Liang, Robert Hryniewicz

Location: Europe

Cybersecurity and Apache Metron

Apache Metron is a new top level Apache project focused on open source big data cyber security analytics platform supporting real time ingest and analytics to discover information security threats and build out a high value security data lake. Apache Metron helps security operations teams be more efficient by reducing the amount of “DIY” big data and data science tooling necessary to detect threats in real time.

Come learn and discuss the latest Metron innovations and future directions.

Host(s): Simon Elliston Ball, Dave Russell, Casey Stella

Location: Room V

IoT, Streaming & Data Flow

Real-time data processing with Apache NiFi, Apache Kafka, Apache Storm and Apache Spark Streaming provides the foundation for IoT. Come learn and discuss the latest streaming & data flow innovations and future directions.


Host(s): George Vetticaden, Davor Bonaci, Andy LoPresto, Stephan Ewen

Location: Room I

Security and Governance

Apache Knox and Apache Ranger provide Hadoop security while Atlas provides a Hadoop metadata store and enterprise compliance. Come learn and discuss security & governance innovations and future directions.

Host(s): Srikanth Venkat, Nigel Jones, Mandy Chessell, Balaji Ganesan

Location: Room II

Cloud & Operations

Apache Ambari and Cloudbreak provide the foundation for Hadoop and Streaming install, configuration and management on-premise and in the cloud. Come learn and discuss Hadoop & Streaming Operations & Cloud innovations and future directions.

Host(s): Steve Loughran, Paul Codding, Gergely Devenyi

Location: Room IV