Using Spark Streaming and NiFi for the next generation of ETL in the enterprise

Using Spark Streaming and NiFi for the next generation of ETL in the enterprise

Wednesday, June 20
2:00 PM - 2:40 PM
Meeting Room 230A

On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in enterprise production environment to deploy and operationalized?

The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story? This session will cover the Royal Bank of Canada’s (RBC) journey of moving away from traditional ETL batch processing with Teradata towards using the Hadoop ecosystem for ingesting data. One of the first systems to leverage this new approach was the Event Standardization Service (ESS). This service provides a centralized “client event” ingestion point for the bank’s internal systems through either a web service or text file daily batch feed. ESS allows down stream reporting applications and end users to query these centralized events.

We discuss the drivers and expected benefits of changing the existing event processing. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.

Presentation Video


Darryl Dutton
Principal Consultant
As a principal consultant at T4G with over 20 years’ experience, Darryl believes that ‘better delivery is achieved through better design’ on IT projects. This experience in design and delivery of software solutions cuts across a variety of industries including the retail, insurance, manufacturing, financial, and government sectors. Darryl specializes in application design with focus on data science platforms under the Hadoop ecosystem and custom application development. In an architect role, he has produced technical design deliverables on many large enterprise projects and recently provided technical guidance on several Hadoop related projects for a national telecom and banking clients.
Kenneth Poon
Director, Data Engineering
Kenneth Poon is a Director of Data Engineering in the Data & Analytics (DNA) group, responsible for architecting, building, and delivering solutions to enable RBC to become a data-driven organization. He has built several large-scale products across the enterprise, specializing in real-time streaming applications. He is currently focused on building out the bank's Channel Analytics Platform.