Abstract
This dissertation presents an analysis of the features of network
traffic commonly used in network-based anomaly detection systems. It
is an examination designed to identify how the selection of a
particular protocol attribute affects performance. It presents a guide
for making judicious selections of features for building network-based
anomaly detection models.
We introduce a protocol analysis methodology called Inter-flow
versus Intra-flow Analysis (IVIA) for partitioning protocol
attributes based on operational behavior. The method aids in the
construction of flow models and identifies the protocol attributes
that contribute to model accuracy, and those that are likely to
generate false positive alerts, when used as features for network
anomaly detection models.
We introduce a set of data preprocessing operations that transform
these previously identified ``noisy'' attributes into useful features
for anomaly detection. We refer to these as behavioral
features. The derivation of this new class of features from observed
measurements is both possible and feasible without undue
computational effort, and can therefore keep pace with network
traffic.
Empirical results using unsupervised learning show that models based
on behavioral features can achieve higher classification accuracies with
markedly lower false positive rates than their traditional packet
header feature counterparts. Behavioral features are also used in the
context of supervised learning to build classifiers of server
application flow behavior.