Computational Environment for Modeling and Analysing Network Traffic Behaviour using the Divide and Recombine Framework
Tech report number
CERIAS TR 2016-6
Abstract
There are two essential goals of this research. The first goal is to design and
construct a computational environment that is used for studying large and complex
datasets in the cybersecurity domain. The second goal is to analyse the Spamhaus
blacklist query dataset which includes uncovering the properties of blacklisted hosts
and understanding the nature of blacklisted hosts over time.
The analytical environment enables deep analysis of very large and complex
datasets by exploiting the divide and recombine framework. The capability to
analyse data in depth enables one to go beyond just summary statistics in research.
This deep analysis is at the highest level of granularity without any compromise on
the size of the data.
The environment is also, fully capable of processing the raw data into a data
structure suited for analysis.
Spamhaus is an organisation that identifies malicious hosts on the Internet.
Information about malicious hosts are stored in a distributed database by
Spamhaus and served through the DNS protocol query-response. Spamhaus and
other malicious-host-blacklisting organisations have replaced smaller malicious host
databases curated independently by multiple organisations for their internal needs.
Spamhaus services are popular due to their free access, exhaustive information,
historical information, simple DNS based implementation, and reliability. The
malicious host information obtained from these databases are used in the first step
of weeding out potentially harmful hosts on the internet.
During the course of this research work a detailed packet-level analysis was
carried out on the Spamhaus blacklist data. It was observed that the
query-responses displayed some peculiar behaviours. These anomalies were studied
and modeled, and identified to be showing definite patterns. These patterns are
empirical proof of a systemic or statistical phenomenon.
Institution
Purdue University
Key alpha
information security, network security, statistics, computer science, DNS, anomalous behaviour,
Organization
Purdue University
Affiliation
Purdue University, H2O.Ai
Publication Date
2016-10-14