Abstract
Two problems of importance in computer security are to 1) detect the presence of
an intruder masquerading as the valid user and 2) detect the perpetration of
abusive actions on the part of an otherwise innocuous user. We have developed an
approach to these problems that examines sequences of user actions (UNIX commands)
to classify behavior as normal or anomalous. In this paper we explore the matching
function needed to compare a current behavioral sequence to a historical profile.
We discuss the difficulties of performing matching in human-generated data and show
that exact string matching is insufficient to this domain. We demonstrate a number
of partial matching functions and examine their behaviors on user command data. In
particular, we explore two methods for weighting scores by adjacency of matches as
well as two growth functions (polynomial and exponential) for scoring similarities.
We find, empirically, that a partial matching function, biased toward adjacent
matches, with a polynomial growth rate is superior for this domain.