The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

The SBC-tree: an index for run-length compressed sequences

Download

Download PDF Document
PDF

Author

MY Eltabakh, Wing-Kai Hon, R Shah, WG Aref, JS Vitter

Entry type

inproceedings

Abstract

Run-Length-Encoding (RLE) is a data compression technique that is used in various applications, e.g., time series, biological sequences, and multimedia databases. One of the main challenges is how to operate on (e.g., index, search, and retrieve) compressed data without decompressing it. In this paper, we introduce the <u>S</u>tring <u>B</u>-tree for <u>C</u>ompressed sequences, termed the SBC-tree, for indexing and searching RLE-compressed sequences of arbitrary length. The SBC-tree is a two-level index structure based on the well-known String B-tree and a 3-sided range query structure [7]. The SBC-tree supports pattern matching queries such as substring matching, prefix matching, and range search operations over RLE-compressed sequences. The SBC-tree has an optimal external-memory space complexity of O(N/B) pages, where N is the total length of the compressed sequences, and B is the disk page size. Substring matching, prefix matching, and range search execute in an optimal O(logB N + |p|+T/B) I/O operations, where |p| is the length of the compressed query pattern and T is the query output size. The SBC-tree is also dynamic and supports insert and delete operations efficiently. The insertion and deletion of all suffixes of a compressed sequence of length m take O(m logB(N + m)) amortized I/O operations. The SBC-tree index is realized inside PostgreSQL. Performance results illustrate that using the SBC-tree to index RLE-compressed sequences achieves up to an order of magnitude reduction in storage, while retains the optimal search performance achieved by the String B-tree over the uncompressed sequences.

Download

PDF

Date

2008

Booktitle

Proceedings of the 11th international conference on Extending database technology: Advances in database technology

Journal

ACM International Conference Proceeding Series

Key alpha

Aref

Pages

523-534

Publisher

ACM

Volume

261

Publication Date

2008-01-01

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.