Streaming Topic Model Training and Inference with Apache Flink

Streaming Topic Model Training and Inference with Apache Flink

Thursday, March 21
2:00 PM - 2:40 PM
Room 120-121

Analyzing streams of text data to extract topics is an important task for getting useful insights to be leveraged in subsequent workflows. For example, extracting topics from text to be continuously ingested into a search engine can be useful to tag documents with important
keywords or concepts to be used at search time. Another use case is doing analysis of support tickets to get insights on the most common problems for customers.

In this talk we illustrate how to use Apache Flink's Dynamic processing and Stateful streaming capabilities to continuously train topic models from unlabelled text and use such models to extract topics from the data itself. Such topic models will be built leveraging distributed representations of words and documents. We’ll be seeing as to how this can be all done in a pure streaming fashion without having to resort to a Lambda Architecture kind’a setup. An earlier version of this talk was presented at Flink Forward Berlin 2018.

Presentation Video


Suneel Marthi
Amazon Web Services
Suneel is a Member of Apache Software Foundation and is a Committer and PMC on Apache Mahout, Apache OpenNLP, Apache Streams. He's presented in the past at Flink Forward, Hadoop Summit, Berlin Buzzwords, Machine Learning Conference, Big Data Tech Warsaw and Apache Big Data.