Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager

Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager

Thursday, April 19
2:50 PM - 3:30 PM
Room II

Running scheduled, long-running or repetitive workflows on Hadoop clusters, especially secure clusters, is the domain of Apache Oozie. Oozie, however, suffers from XML for job configuration and a dated UI -- very bad usability in all. Apache Ambari, in its quest to make cluster management easier, has branched out to offering views for user services. This talk covers the Ambari Workflow Manager view which provides a GUI to author and visualize Oozie jobs.

To provide an example of Workflow Manager, Oozie jobs for log management and HBase compactions will be demonstrated showing off how easy Oozie can now be and what the exciting future for Oozie and Workflow Manager holds.

Apache Oozie is the long-time incumbent in big data processing. It is known to be hard to use and the interface is not aesthetically pleasing -- Oozie suffers from a dated UI. However, for secure Hadoop clusters, Oozie is the most readily available, obvious and full featured solution.

Apache Ambari is a deployment and configuration management tool used to deploy Hadoop clusters. Ambari Workflow Manager is a new Ambari view that helps address the usability and UI appeal of Apache Oozie.

In this talk, we’re going to leverage the stable foundation of Apache Oozie and clarity of Workflow Manager to demonstrate how one can build powerful batch workflows on top of Apache Hadoop. We’re also going to cover future roadmap and vision for both Apache Oozie and Workflow Manager. We will finish off with a live demo of Workflow Manager in action.

Presentation Video


Artem Ervits
Solutions Engineer
Artem Ervits is a Solutions Engineer at Hortonworks. Hortonworks is a leading big data software company based in Santa Clara, California. The company develops and supports Apache Hadoop, for the distributed processing of large data sets across computer clusters. Artem is an organizer of the NYC Future of Data Meetup and contributor to Apache Oozie. He works with Workflow Manager and Oozie product management and engineering teams to shape the future direction for Workflow Manager and Oozie. You may reach him with questions on Oozie, HBase, Phoenix, Pig and Hive.
Clay Baenziger
Hadoop Infrastructure
Clay Baenziger - is an architect for the Hadoop Infrastructure Team at Bloomberg. Clay comes from a diverse background in systems infrastructure and analytics. At Sun Microsystems, his team built out an automated bare-metal Solaris deployment tool for Solaris engineering labs and later his contributions were core to the OpenSolaris Automated Installer. Providing a good introduction to Hadoop, his team at Opera Solutions built out a financial portfolio analytics product. Merging the two, his team at Bloomberg has now openly developed infrastructure for low-latency HBase, Spark, scalable ingest with Kafka and big-data warehousing using much of the Hadoop ecosystem. Clay is a past leader and presenter at the Front Range OpenSolaris Users Group (FROSUG) and has provided big data ideas at Hadoop Summit North America (2014), the San Francisco Hadoop Users Group (July '14), Chef Conf (2015) and HBase Con East (2016), ApacheCon Big Data (2017), DataWorks Summit San Jose (2017).