Abstract
Given two strings T = a 1 : : :a n and P = b 1 : : :b m over an alphabet , the problem of testing
whether P occurs as a subsequence of T is trivially solved in linear time. It is also known
that a simple O ( n log j j) time preprocessing of T makes it easy to decide subsequently for any
P and in at most j P j log j j character comparisons, whether P is a subsequence of T . These
problems become more complicated if one asks instead whether P occurs as a subsequence of
some substring Y of T of bounded length. This paper presents an automaton built on the
textstring T and capable of identifying all distinct minimal substrings Y of X having P as a
subsequence. By a substring Y being minimal with respect to P , it is meant that P is not
a subsequence of any proper substring of Y . For every minimal substring Y , the automaton
recognizes the occurrence of P having lexicographically smallest sequence of symbol positions
in Y . It is not di cult to realize such an automaton in time and space O ( n
2
) for a text of
n characters. One result of this paper consists of bringing those bounds down to linear or
O ( n log n ), respectively, depending on whether the alphabet is bounded or of arbitrary size,
thereby matching the respective complexities of o -line exact string searching. Having built
the automaton, the search for all lexicographically earliest occurrences of P in X is carried out
in time O ( n + P
m
i=1 rocc i i log n log j j), where rocc i is the number of distinct minimal
substrings of T having b 1 : : :b i as a subsequence. All log factors appearing in the above bounds
can be further reduced to log log by resort to known integer-handling data structures.