Scalable and adaptable typosquatting detection in Apache Metron

Scalable and adaptable typosquatting detection in Apache Metron

Tuesday, June 19
11:50 AM - 12:30 PM
Meeting Room 230C

Typosquatting is a form of cybersquatting (e.g. registering a domain that is similar to an existing domain) where a domain that is a common misspelling of another domain is registered and used for possibly malicious ends. It is an attack that has affected everyone from Google to the televangelist Jerry Falwell and has regulations against it encoded in US law. Even so, it remains extremely popular and particularly heinous and effective when used by advanced malicious actors who may rely on typosquatted domains to gull unwitting users to be more likely to click on a malicious URL in a spearphishing attack.

Detecting typosquatting attacks in realtime can be challenging as there are as many ways to typosquat as there are to make typos. Often, intrusion detection systems will generate the typosquatted domains and store them in a database for comparison. However, given the number of domains possible, this is a daunting task storage-wise. Furthermore, this approach can become out of date quickly.

We will talk about using sketching data structures in Metron to detect typosquatted domains scalably and adaptably. Furthermore, we will discuss how to ensure that the set of typosquatted domains is kept current with the domains actually seen in an organization's network.


Casey Stella
Principal Software Engineer
I am a Principal Software Engineer focusing on Data Science at Hortonworks. I am also the Vice President and an active committer for the Apache Metron project. In the past, I've worked as an architect and senior engineer at a healthcare informatics startup spun out of the Cleveland Clinic, as a developer at Oracle and as a Research Geophysicist in the Oil & Gas industry. Before that, I was a poor graduate student in Math at Texas A&M. I primarily work with the Apache Hadoop software stack. I specialize in writing software and solving problems where there are either scalability concerns due to large amounts of traffic or large amounts of data. I have a particular passion for data science problems or any thing vaguely mathematical. As a Principal Architect focused on data science, I spend time with a variety of clients, large and small, mentoring and helping them use Hadoop to solve their problems.
Michael Miklavcic
Staff Engineer
Michael Miklavcic is a committer and PMC member for Apache Metron and has been involved with the project for the past two years. He is a software engineer and architect with over ten years of industry experience and worked as a Systems Architect with Hortonworks for three years prior to transitioning to the engineering team for Metron. He has given numerous talks both on the domestic and international stage, including Hadoop Summit San Jose, Apache Con Big Data Europe, and multiple local Hadoop user groups. He is a code contributor to multiple Apache open source projects and has worked directly with clients to implement solutions using Hadoop. Michael has degrees in computer science and computer information systems from Baldwin Wallace in Cleveland, OH.