The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Hidden Patterns Statistics

Author

P. Flajolet, Y. Guivarch, W. Szpankowski, B. Vallee

Entry type

article

Abstract

Two fundamental problems in combinatorics on words and string manipulation are string matching and sequence comparison. In string matching one searches for all occurrences of a given {it string}, understood as a sequence of consecutive symbols, in a text. In sequence comparison a {it subsequence} rather than a string is searched in a text The string matching problem has been extensively studied in literature from algorithmic and probabilistic points of view. The sequence comparison problem, also known as {it hidden pattern} problem, is harder and it has been much less investigated. In this paper we study the number of occurrences of a given pattern~$w$ of length $m$ as a subsequence in a random text of length $n$ generated by a memoryless source. In particular, we consider two versions of this problem, namely the {\\it unconstrained} one in which the subsequence $w$ can appear anywhere in the text, and the {\\it constrained} one that puts bounds on the distances between symbols of the word $w$. We determine the mean and the variance of the number of occurrences, and establish a Gaussian limit law. These results are obtained via combinatorics on words, formal languages, and methods of analytic combinatorics based on generating functions and moment methods. The motivation to study this problem comes from an attempt at finding a reliable threshold for intrusion detections, from textual data processing applications, and from molecular biology.

Key alpha

hidden patterns, subsequences, intrusion detection

Affiliation

INRIA, IRMAR, Purdue, GREYC

Publication Date

0000-00-00

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.