The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

A Fixed-Size Bloom Filter for Searching Textual Documents

Author

M. A. Shepherd,W. J. Phillips,C. K. Chu

Entry type

article

Abstract

The empirical false drop rate associated with a fixed-size Bloom filter used to represent textual documents may be quite different than the theoretical rate. This problem arises when the filter size is based on the expectation of a uniform distribution of the number of different terms per document. The distribution is, in fact, not uniform. This paper describes a method to determine the filter size for a database of textual documents, based on the desired false drop rate and the actual distribution of different words over the documents for that database. Theoretical and experimental results are reported and indicate that a filter size based on this method produces empirical false drop rates equivalent to the theoretical rates. The filter was also compared to variable-length filters with respect to storage requirements and search times.

Journal

The Computer Journal

Key alpha

Shepherd

Number

3

Volume

32

Publication Date

2001-01-01

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.