Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise

Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise

Wednesday, February 6
4:50 PM - 5:30 PM
Room 103

In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world so that nearly every streaming framework now supports higher level relational operations.

On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in an enterprise production environment to deploy and operationalized?

The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story?

We discuss the drivers and expected benefits of changing the existing event processing systems. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.


Andrew Psaltis
Principal Solution Engineer
Andrew Psaltis is deeply entrenched in streaming and IoT systems and obsessed with delivering insight at the speed of thought. As the author of Streaming Data ( by Manning, an international speaker and trainer he spends most of his waking hours thinking about, writing about, and building streaming systems.When he's not busy being busy, he's spending time with his lovely wife, two kids, and watching as much Lacrosse as possible. He has spoken at Berlin Buzzwords (2014, 2015,2016,2017) and ApacheCon (2015,2017), QCon New York (2016), IoT StampedCon (2017), and Big Data StampedCon (2015,2016,2017), Dataworks Summit Sydney (2018).