Building streaming pipelines for neural machine translation

Building streaming pipelines for neural machine translation

Thursday, April 19
4:00 PM - 4:40 PM
Convention Hall I - C

Machine translation is important when having to cater to different geographies and locales for news or eCommerce website content. Machine translation systems often need to handle a large volume of concurrent translation requests from multiple sources in multiple languages. They have to do this in real time while making efficient use of specialized hardware.

Many machine translation preprocessing tasks like text normalization, language detection, sentence segmentation, etc. can be performed at scale in a real-time streaming pipeline utilizing Apache Flink or Apache Storm. We will be looking at a few such streaming pipelines leveraging Apache OpenNLP components. These components will preprocess data into a format that can be consumed by a neural machine translation library like Sockeye, which is based on the Apache MXNet deep learning framework.

We'll demonstrate and examine the end-to-end throughput and latency of a pipeline that detects language and translates news articles shared via twitter in real time. Developers will come away with a better understanding of how neural machine translation works and how to build pipelines for machine translation preprocessing tasks and neural machine translation models. They’ll have access to a demo repository to experiment with and will build machine translation models themselves.

Presentation Video


Kellen Sunderland
Software Development Engineer
A seasoned Software Engineer and Apache Member, Kellen has spent two years working on large scale machine translation systems. He has recently focused on optimizing deep learning and machine translation models for use in environments ranging from large cloud-based services to IoT devices at the edge. He is an active developer contributing the the Apache MXNet project, and has previously contributed to the Apache Joshua (incubating) project.
Suneel Marthi
Amazon Web Services
Suneel is a Member of Apache Software Foundation and is a Committer and PMC on Apache Mahout, Apache OpenNLP, Apache Streams. He's presented in the past at Flink Forward, Hadoop Summit, Berlin Buzzwords, Machine Learning Conference, Big Data Tech Warsaw and Apache Big Data.