On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in enterprise production environment to deploy and operationalized?
The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story? This session will cover the Royal Bank of Canada’s (RBC) journey of moving away from traditional ETL batch processing with Teradata towards using the Hadoop ecosystem for ingesting data. One of the first systems to leverage this new approach was the Event Standardization Service (ESS). This service provides a centralized “client event” ingestion point for the bank’s internal systems through either a web service or text file daily batch feed. ESS allows down stream reporting applications and end users to query these centralized events.
We discuss the drivers and expected benefits of changing the existing event processing. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.