The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Reports and Papers Archive


Browse All Papers »       Submit A Paper »

Exploiting predicate-window semantics over data streams

TM Ghanem, WG Aref, AK Elmagarmid
Download: PDF

The continuous sliding-window query model is used widely in data stream management systems where the focus of a continuous query is limited to a set of the most recent tuples. In this paper, we show that an interesting and important class of queries over data streams cannot be answered using the sliding-window query model. Thus, we introduce a new model for continuous window queries, termed the predicate-window query model that limits the focus of a continuous query to the stream tuples that qualify a certain predicate. Predicate-window queries have some distinguishing characteristics, e.g., (1) The window predicate can be defined over any attribute in the stream tuple (ordered or unordered). (2) Stream tuples qualify and disqualify the window predicate in an out-of-order manner. In this paper, we discuss the applicability of the predicate-window query model. We will show how the existing sliding-window query models fail to answer some of the predicate-window queries. Finally, we discuss the challenges in supporting the predicate-window query model in data stream management systems.

Added 2008-04-08

Adaptive rank-aware query optimization in relational databases

IF Ilyas, WG Aref, AK Elmagarmid, HG Elmongui, R Shah, JS Vitter
Download: PDF

Rank-aware query processing has emerged as a key requirement in modern applications. In these applications, efficient and adaptive evaluation of top-k queries is an integral part of the application semantics. In this article, we introduce a rank-aware query optimization framework that fully integrates rank-join operators into relational query engines. The framework is based on extending the System R dynamic programming algorithm in both enumeration and pruning. We define ranking as an interesting physical property that triggers the generation of rank-aware query plans. Unlike traditional join operators, optimizing for rank-join operators depends on estimating the input cardinality of these operators. We introduce a probabilistic model for estimating the input cardinality, and hence the cost of a rank-join operator. To our knowledge, this is the first effort in estimating the needed input size for optimal rank aggregation algorithms. Costing ranking plans is key to the full integration of rank-join operators in real-world query processing engines.Since optimal execution strategies picked by static query optimizers lose their optimality due to estimation errors and unexpected changes in the computing environment, we introduce several adaptive execution strategies for top-k queries that respond to these unexpected changes and costing errors. Our reactive reoptimization techniques change the execution plan at runtime to significantly enhance the performance of running queries. Since top-k query plans are usually pipelined and maintain a complex ranking state, altering the execution strategy of a running ranking query is an important and challenging task.We conduct an extensive experimental study to evaluate the performance of the proposed framework. The experimental results are twofold: (1) we show the effectiveness of our cost-based approach of integrating ranking plans in dynamic programming cost-based optimizers; and (2) we show a significant speedup (up to 300%) when using our adaptive execution of ranking plans over the state-of-the-art mid-query reoptimization strategies.

Added 2008-04-08

On local heuristics to speed up polygon-polygon intersection tests

WM Badawy, WG Aref
Download: PDF
Added 2008-04-08

SINA: scalable incremental processing of continuous queries in spatio-temporal databases

MF Mokbel, X Xiong, WG Aref
Download: PDF

This paper intoduces the Scalable INcremental hash-based Algorithm (SINA, for short); a new algorithm for evaluting a set of concurrent continuous spatio-temporal queries. SINA is designed with two goals in mind: (1) Scalability in terms of the number of concurrent continuous spatio-temporal queries, and (2) Incremental evaluation of continyous spatio-temporal queries. SINA achieves scalability by empolying a shared execution paradigm where the execution of continuous spatio-temporal queries is abstracted as a spatial join between a set of moving objects and a set of moving queries. Incremental evaluation is achived by computing only the updates of the previously reported answer. We introduce two types of updaes, namely positive and negative updates. Positive or negative updates indicate that a certain object should be added to or removed from the previously reported answer, respectively. SINA manages the computation of postive and negative updates via three phases: the hashing phase, the invalidation phase, and the joining phase. the hashing phase employs an in-memory hash-based join algorithm that results in a set a positive upldates. The invalidation phase is triggered every T seconds or when the memory is fully occupied to produce a set of negative updates. Finally, the joining phase is triggered by the end of the invalidation phase to produce a set of both positive and negative updates that result from joining in-memory data with in-disk data. Experimental results show that SINA is scalable and is more efficient than other index-based spatio-temporal algorithms.

Added 2008-04-08

Video query processing in the VDBMS testbed for video database research

W Aref, M Hammad, AC Catlin, I Ilyas, T Ghanem, A Elmagarmid, M Marzouk
Download: PDF

The increased use of video data sets for multimedia-based applications has created a demand for strong video database support, including efficient methods for handling the content-based query and retrieval of video data. Video query processing presents significant research challenges, mainly associated with the size, complexity and unstructured nature of video data. A video query processor must support video operations for search by content and streaming, new query types, and the incorporation of video methods and operators in generating, optimizing and executing query plans. In this paper, we address these query processing issues in two contexts, first as applied to the video data type and then as applied to the stream data type. We first present the query processing functionality of the VDBMS video database management system as a framework designed to support the full range of functionality for video as an abstract data type. We describe two query operators for the video data type which implement the rank-join and stop-after algorithms. As videos may be considered streams of consecutive image frames, video query processing can be expressed as continuous queries over video data streams. The stream data type was therefore introduced into the VDBMS system, and system functionality was extended to support general data streams. From this viewpoint, we present an approach for defining and processing streams, including video, through the query execution engine. We describe the implementation of several algorithms for video query processing expressed as continuous queries over video streams, such as fast forward, region-based blurring and left outer join. We include a description of the window-join algorithm as a core operator for continuous query systems, and discuss shared execution as an optimization approach for stream query processing.

Added 2008-04-08

Performance of multi-dimensional space-filling curves

MF Mokbel, WG Aref, I Kamel
Download: PDF

A space-filling curve is a way of mapping the multi-dimensional space into the one-dimensional space. It acts like a thread that passes through every cell element (or pixel) in the D-dimensional space so that every cell is visited exactly once. There are numerous kinds of space-filling curves. The difference between such curves is in their way of mapping to the one dimensional space. Selecting the appropriate curve for any application requires knowledge of the mapping scheme provided by each space-filling curve. A space-filling curve consists of a set of segments. Each segment connects two consecutive multi-dimensional points. Five different types of segments are distinguished, namely, Jump, Contiguity, Reverse, Forward, and Still. A description vector V=(J,C,R,F,S), where J,C,R,F, and S, are the percentages of Jump, Contiguity, Reverse, Forward, and Still segments in the space-filling curve, encapsulates all the properties of a space-filling curve. The knowledge of V facilitates the process of selecting the appropriate space-filling curve for different applications. Closed formulas are developed to compute the description vector V for any D-dimensional space and grid size N for different space-filling curves. A comparative study of different space filling curves with respect to the description vector is conducted and results are presented and discussed.

Added 2008-04-08

Towards scalable location-aware services: requirements and research issues

MF Mokbel, WG Aref, SE Hambrusch, S Prabhakar
Download: PDF

The emergence of location-aware services calls for new real time spatio-temporal query processing algorithms that deal with large numbers of mobile objects and queries. Online query response is an important characterization of location-aware services. A delay in the answer to a query gives invalid and obsolete results, simply because moving objects can change their locations before the query responds. To handle large numbers of spatio-temporal queries efficiently, we propose the idea of sharing as a means to achieve scalability. In this paper, we introduce several types of sharing in the context of continuous spatio-temporal queries. Examples of sharing in the context of real-time spatio-temporal database systems include sharing the execution, sharing the underlying space, sharing the sliding time windows, and sharing the objects of interest. We demonstrate how sharing can be integrated into query predicates, e.g., selection and spatial join processing. The goal of this paper is to outline research directions and approaches that will lead to scalable and efficient location-aware services.

Added 2008-04-08

The SBC-tree: an index for run-length compressed sequences

MY Eltabakh, Wing-Kai Hon, R Shah, WG Aref, JS Vitter
Download: PDF

Run-Length-Encoding (RLE) is a data compression technique that is used in various applications, e.g., time series, biological sequences, and multimedia databases. One of the main challenges is how to operate on (e.g., index, search, and retrieve) compressed data without decompressing it. In this paper, we introduce the String B-tree for C

ompressed sequences, termed the SBC-tree, for indexing and searching RLE-compressed sequences of arbitrary length. The SBC-tree is a two-level index structure based on the well-known String B-tree and a 3-sided range query structure [7]. The SBC-tree supports pattern matching queries such as substring matching, prefix matching, and range search operations over RLE-compressed sequences. The SBC-tree has an optimal external-memory space complexity of O(N/B) pages, where N is the total length of the compressed sequences, and B is the disk page size. Substring matching, prefix matching, and range search execute in an optimal O(logB N + |p|+T/B) I/O operations, where |p| is the length of the compressed query pattern and T is the query output size. The SBC-tree is also dynamic and supports insert and delete operations efficiently. The insertion and deletion of all suffixes of a compressed sequence of length m take O(m logB(N + m)) amortized I/O operations. The SBC-tree index is realized inside PostgreSQL. Performance results illustrate that using the SBC-tree to index RLE-compressed sequences achieves up to an order of magnitude reduction in storage, while retains the optimal search performance achieved by the String B-tree over the uncompressed sequences.

Added 2008-04-08

GPAC: generic and progressive processing of mobile queries over mobile data

MF Mokbel, WG Aref
Download: PDF

This paper introduces a new family of Generic and Progressive algorithms (GPAC, for short) for continuous mobile queries over mobile objects. GPAC provides a general skeleton that can be tuned through a set of methods to behave as various continuous queries (e.g., continuous range queries and continuous k-nearest-neighbor queries). GPAC algorithms aim to provide three goals: (1) Online evaluation through an in-memory processing of the incoming mobile data. (2) Progressive evaluation through employing an incremental evaluation paradigm. (3) Fast query response through employing an anticipation paradigm. Query answer is anticipated and is cached in memory to allow for fast evaluation. GPAC algorithms are encapsulated in physical pipelined query operators. GPAC pipelined operators can be combined with traditional query operators in a query execution plan to support a wide variety of continuous queries. Experimental results based on a real implementation inside a prototype streaming database engine show the efficiency of GPAC operators in providing incremental and fast response for continuous queries.

Added 2008-04-08

Pipelined spatial join processing for quadtree-based indexes

W Aref
Download: PDF

Spatial join is an important yet costly operation in spatial databases. In order to speed up the execution of a spatial join, the input tables are often indexed based on their spatial attributes. The quadtree index structure is a well-known index for organizing spatial database objects. It has been implemented in several database management systems, e.g., in Oracle Spatial and in PostgreSQL (via SP-GiST). Queries typically involve multiple pipelined spatial join operators that fit together in a query evaluation plan. In order to extend the applicability of these spatial joins, they are optimized so that upon receiving sorted input, they produce sorted output for the spatial join operators in the upperlevels of the query evaluation pipeline. This paper investigates the use of quadtree-based spatial join algorithms and how they can be adapted to answer queries that involve multiple pipelined spatial joins in a query evaluation plan. The paper investigates several adaptations to pipelined spatial join algorithms and their performance for the cases when both input tables are indexed, when only one of the tables is indexed while the second table is sorted, and when both tables are sorted but are not indexed.

Added 2008-04-08

Challenges in spatiotemporal stream query optimization

HG Elmongui, M Ouzzani, WG Aref
Download: PDF

Simplified technology and low costs have spurred the use of location-detection devices in moving objects. Usually, these devices will send the moving objects’ location information to a spatio-temporal data stream management system, which will be then responsible for answering spatio-temporal queries related to these moving objects. A large spectrum of research have been devoted to continuous spatio-temporal query processing. However, we argue that several outstanding challenges have been either addressed partially or not at all in the existing literature. In particular, in this paper, we focus on the optimization of multi-predicate spatio-temporal queries on moving objects. We present several major challenges related to the lack of spatio-temporal pipelined operators, and the impact of time, space, and their combination on the query plan optimality under different circumstances mof query and object distributions. We show that building an adaptive query optimization framework is key in addressing these challenges and coping with the dynamic nature of the environment we are evolving in.

Added 2008-04-08

Irregularity in multi-dimensional space-filling curves with applications in multimedia databases

MF Mokbel, WG Aref
Download: PDF

A space-filling curve is a way of mapping the multi-dimensional space into the one-dimensional space. It acts like a thread that passes through every cell element (or pixel) in the N-dimensional space so that every cell is visited at least once. Thus, a space-filling curve imposes a linear order of the cells in the N-dimensional space. There are numerous kinds of space-filling curves. The difference between such curves is in their way of mapping to the one-dimensional space. Selecting the appropriate curve for any application requires a brief knowledge of the mapping scheme provided by each space-filling curve. Irregularity is proposed as a quantitative measure of the quality of the mapping of the space-filling curve. Closed formulas are developed to compute the irregularity for any general dimension D with N points in each dimension for different space-filling curves.A comparative study of different space-filling curves with respect to irregularity is conducted and results are presented and discussed. The applicability of this research is the area of multimedia databases is illustrated with a discussion of the problems that arise.

Added 2008-04-08

Trojan Horse Resistant Discretionary Access Control

CERIAS TR 2008-8
Ziqing Mao, Ninghui Li, Hong Chen, Xuxian Jiang
Download: PDF

Modern operating systems primarily use Discretionary Access Control (DAC) to protect files and other operating system resources. DAC mechanisms are more user-friendly than Mandatory Access Control (MAC) systems, but are vulnerable to trojan horse attacks and attacks exploiting buggy software.  We show that it is possible to have the best of both worlds: DAC’s easy-to-use discretionary policy specification and MAC’s defense against trojan horses and buggy programs. This is made possible by a key new insight that DAC has this weakness not because it uses the discretionary principle, but because existing DAC enforcement mechanisms assume that a single principal is responsible for any request, whereas in reality a request may be influenced by multiple principals; thus these mechanisms cannot correctly identify the true origin(s) of a request and fall prey to trojan horses. We propose to solve this problem by combining DAC’s policy specification with new enforcement techniques that use ideas from MAC’s information flow tracking. Our model, called Information Flow Enhanced Discretionary Access Control (IFEDAC), is the first DAC model that can defend against trojan horses and attacks exploiting buggy software.  IFEDAC significantly strengthens end host security, while preserving to a large degree DAC’s ease of use. In this paper, we present the IFEDAC model, analyze its security properties, and discuss our design and implementation for Linux.

Added 2008-04-08

Rank-aware query optimization

IF Ilyas, R Shah, WG Aref, JS Vitter, AK Elmagarmid
Download: PDF

Ranking is an important property that needs to be fully supported by current relational query engines. Recently, several rank-join query operators have been proposed based on rank aggregation algorithms. Rank-join operators progressively rank the join results while performing the join operation. The new operators have a direct impact on traditional query processing and optimization.We introduce a rank-aware query optimization framework that fully integrates rank-join operators into relational query engines. The framework is based on extending the System R dynamic programming algorithm in both enumeration and pruning. We define ranking as an interesting property that triggers the generation of rank-aware query plans. Unlike traditional join operators, optimizing for rank-join operators depends on estimating the input cardinality of these operators. We introduce a probabilistic model for estimating the input cardinality, and hence the cost of a rank-join operator. To our knowledge, this paper is the first effort in estimating the needed input size for optimal rank aggregation algorithms. Costing ranking plans, although challenging, is key to the full integration of rank-join operators in real-world query processing engines. We experimentally evaluate our framework by modifying the query optimizer of an open-source database management system. The experiments show the validity of our framework and the accuracy of the proposed estimation model.

Added 2008-04-08

LUGrid: Update-tolerant Grid-based Indexing for Moving Objects

X Xiong, MF Mokbel, WG Aref

Indexing moving objects is a fundamental issue in spatiotemporal databases. In this paper, we propose an adaptive Lazy-Update Grid-based index (LUGrid, for short) that minimizes the cost of object updates. LUGrid is designed with two important features, namely, lazy insertion and lazy deletion. Lazy insertion reduces the update I/Os by adding an additional memory-resident layer over the disk index. Lazy deletion reduces update cost by avoiding deleting single obsolete entry immediately. Instead, the obsolete entries are removed later by specially designed mechanisms. LUGrid adapts to object distributions through cell splitting and merging. Theoretical analysis and experimental results indicate that LUGrid outperforms former work by up to eight times when processing intensive updates, while yielding similar search performance.

Added 2008-04-08