It's a Small Cluster After All - Analyzing the Disneyland Team with Machine Learning

Drowning in data is a common problem in Threat Intelligence investigations. When faced with potentially hundreds or thousands of potentially relevant pieces of information, how does an analyst group them together or pull apart useful and relevant bits of that pile of data? This talk will offer a few tools to answer those questions, focusing on victim/target identification in a large set of DNS names. In particular, we will focus on a set of domain names registered by the "Disneyland team". It will walk through Machine Learning techniques for addressing multiple problems, such as finding homoglyph domains, clustering domains, clustering subdomains, TF-IDF of groups of domains, levenshtein distance between names, etc. The talk will use open source tools to do all of this work, and will include code to allow others to do this work themselves.