The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Reports and Papers Archive


Browse All Papers »       Submit A Paper »

SP-GiST: An Extensible Database Index for Supporting Space Partitioning

WG Aref, IF Ilyas
Download: PDF

Emerging database applications require the use of new indexing structures beyond B-trees and R-trees. Examples are the k-D tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SP-GiST is presented that supports this class of data structures, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SP-GiST can behave as a k-D tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SP-GiST are addressed. A dynamic minimum-height clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SP-GiST is presented as well as performance studies of the various SP-GiST’‘s tuning parameters.

Added 2008-04-10

Bulk operations for space-partitioning trees

TM Ghanem,R Shah, MF Mokbel,WG Aref, JS Vitter
Download: PDF

The emergence of extensible index structures, e.g., GiST (generalized search tree) [J.M. Hellerstein et al. (1995)] and SP-GiST (space-partitioning generalized search tree) [W. G Aref et al., (2001)], calls for a set of extensible algorithms to support different operations (e.g., insertion, deletion, and search). Extensible bulk operations (e.g., bulk loading and bulk insertion) are of the same importance and need to be supported in these index engines. In this paper, we propose two extensible buffer-based algorithms for bulk operations in the class of space-partitioning trees; a class of hierarchical data structures that recursively decompose the space into disjoint partitions. The main idea of these algorithms is to build an in-memory tree of the target space-partitioning index. Then, data items are recursively partitioned into disk-based buffers using the in-memory tree. Although the second algorithm is designed for bulk insertion, it can be used in bulk loading as well. The proposed extensible algorithms are implemented inside SP-GiST; a framework for supporting the class of space-partitioning trees. Both algorithms have I/O bound O(NH/B), where N is the number of data items to be bulk loaded/inserted, B is the number of tree nodes that can fit in one disk page, H is the tree height in terms of pages after applying a clustering algorithm. Experimental results are provided to show the scalability and applicability of the proposed algorithms for the class of space-partitioning trees. A comparison of the two proposed algorithms shows that the first algorithm performs better in case of bulk loading. However the second algorithm is more general and can be used for efficient bulk insertion.

Added 2008-04-10

Window Query Processing in Linear Quadtrees

A Aboulnaga, WG Aref
Download: PDF

The linear quadtree is a spatial access method that is built by decomposing the spatial objects in a database into quadtree blocks and storing these quadtree blocks in a B-tree. The linear quadtree is very useful for geographic information systems because it provides good query performance while using existing B-tree implementations. An algorithm and a cost model are presented for processing window queries in linear quadtrees. The algorithm can handle query windows of any shape in the general case of spatial databases with overlapping objects. The algorithm recursively decomposes the space into quadtree blocks, and uses the quadtree blocks overlapping the query window to search the B-tree. The cost model estimates the I/O cost of processing window queries using the algorithm. The cost model is also based on a recursive decomposition of the space, and it uses very simple parameters that can easily be maintained in the database catalog. Experiments with real and synthetic data sets verify the accuracy of the cost model.

Added 2008-04-10

A distributed database server for continuous media

WG Aref, AC Catlin, AK Elmagarmid, J Fan, J Guo, M Hammad, IF Ilyas, MS Marzouk, S Prabhakar, A Rezgui, S Teoh, E Terzi, Y Tu, A Vakali, XQ Zhu
Download: PDF

In our project, we are adopting a new approach for handling video data. We view the video as a well-defined data type with its own description, parameters and applicable methods. The system is based on PREDATOR, an open-source object-relational DBMS. PREDATOR uses Shore as the underlying storage manager. Supporting video operations (storing, searching-by-content and streaming) and new query types (query-by-example and multi-feature similarity searching) requires major changes in many of the traditional system components. More specifically, the storage and buffer manager has to deal with huge volumes of data with real-time constraints. Query processing has to consider the video methods and operators in generating, optimizing and executing the query plans

Added 2008-04-10

VDBMS: A testbed facility for research in video database benchmarking

Walid Aref, Ann Christine Catlin, Ahmed Elmagarmid, Jianping Fan, Moustafa Hammad, Ihab Ilyas, Mirette Marzouk, Sunil Prabhakar, Yi-Cheng Tu and Xingquan Zhu

Real-world video-based applications require database technology that is capable of storing digital video in the form of video databases and providing content-based video search and retrieval. Methods for handling traditional data storage, query, search, retrieval, and presentation cannot be extended to provide this functionality. The VDBMS research initiative is motivated by the requirements of video-based applications to search and retrieve portions of video data based on content and by the need for testbed facilities to facilitate research in the area of video database management. In this paper we describe the VDBMS video database research platform, a system that supports comprehensive and efficient database management for digital video. Our fundamental concept is to provide a full range of functionality for video as a well-defined abstract database data type, with its own description, parameters, and applicable methods. Research problems that are addressed by VDBMS to support the handling of video data include MPEG7 standard multimedia content representation, algorithms for image-based shot detection, image processing techniques for extracting low-level visual features, a high-dimensional indexing technique to access the high-dimensional feature vectors extracted by image preprocessing, multimedia query processing and optimization, new query operators, real-time stream management, a search-based buffer management policy, and an access control model for selective, content-based access to streaming video. VDBMS also provides an environment for testing the correctness and scope of new video processing techniques, measuring the performance of algorithms in a standardized way, and comparing the performance of different implementations of an algorithm or component. We are currently developing video component wrappers with well-defined interfaces to facilitate the modification or replacement of video processing components. The ultimate goal of the VDBMS project is a flexible, extensible framework that can be used by the research community for developing, testing, and benchmarking video database technologies.

Added 2008-04-10

VDBMS: A testbed facility for research in video database benchmarking

W Aref, AC Catlin, A Elmagarmid, J Fan, M Hammad, I Ilyas, M Marzouk, S Prabhakar, Yi-Chen Tu, X Zhu
Download: PDF

Real-world video-based applications require database technology that is capable of storing digital video in the form of video databases and providing content-based video search and retrieval. Methods for handling traditional data storage, query, search, retrieval, and presentation cannot be extended to provide this functionality. The VDBMS research initiative is motivated by the requirements of video-based applications to search and retrieve portions of video data based on content and by the need for testbed facilities to facilitate research in the area of video database management. In this paper we describe the VDBMS video database research platform, a system that supports comprehensive and efficient database management for digital video. Our fundamental concept is to provide a full range of functionality for video as a well-defined abstract database data type, with its own description, parameters, and applicable methods. Research problems that are addressed by VDBMS to support the handling of video data include MPEG7 standard multimedia content representation, algorithms for image-based shot detection, image processing techniques for extracting low-level visual features, a high-dimensional indexing technique to access the high-dimensional feature vectors extracted by image preprocessing, multimedia query processing and optimization, new query operators, real-time stream management, a search-based buffer management policy, and an access control model for selective, content-based access to streaming video. VDBMS also provides an environment for testing the correctness and scope of new video processing techniques, measuring the performance of algorithms in a standardized way, and comparing the performance of different implementations of an algorithm or component. We are currently developing video component wrappers with well-defined interfaces to facilitate the modification or replacement of video processing components. The ultimate goal of the VDBMS project is a flexible, extensible framework that can be used by the research community for developing, testing, and benchmarking video database technologies.

Added 2008-04-10

An improved automatic isotropic color edge detection technique

Jianping Fan, Walid G. Aref, Mohand-Said Hacid, and Ahmed K. Elmagarmid

In many image processing applications, edge detection is a useful method for obtaining a simplified image that preserves the domain geometric structures and spatial relationships among variant image components. For providing automatic edge detection, two problems should be solved: one is feature extraction for calculating the edge strength, another is feature classification for automatic edge detection. For solving these two problems, we propose an improved automatic edge detection technique. Three color components of a color image and more potential edge patterns are integrated for solving the feature extraction problem. A fast entropic thresholding technique is also developed for solving the feature classification problem. Experimental results have confirmed that this proposed edge detector can provide more reasonable results as compared with the traditional isotropic edge operators, and its calculation cost has been reduced as compared with the complex edge detectors. Good balance between the calculation cost and the edge detection accuracy is achieved.

Added 2008-04-10

Optimizing in-order execution of continuous queries over streamed sensor data

MA Hammad, WG Aref, AK Elmagarmid
Added 2008-04-10

Natural Language Watermarking : Challenges in Building a Practical System

Mike Atallah
Download: PDF

This paper gives an overview of the research and implementation challenges we encountered in building an end- to-end natural language processing based watermarking system. With natural language watermarking, we mean embedding the watermark into a text document, using the natural language components as the carrier, in such a way that the modifications are imperceptible to the readers and the embedded information is robust against possible attacks. Of particular interest is using the structure of the sentences in natural language text in order to insert the watermark. We evaluated the quality of the watermarked text using an objective evaluation metric, the BLEU score. BLEU scoring is commonly used in the statistical machine translation community. Our current system prototype achieves 0.45 BLEU score on a scale [0,1].

Added 2008-04-09


Privacy and ownership preserving of outsourced medical data

Elisa Bertino
Download: PDF

The demand for the secondary use of medical data is increasing steadily to allow for the provision of better quality health care. Two important issues pertaining to this sharing of data have to be addressed: one is the privacy protection for individuals referred to in the data; the other is copyright protection over the data. In this paper, we present a unified framework that seamlessly combines techniques of binning and digital watermarking to attain the dual goals of privacy and copyright protection. Our binning method is built upon an earlier approach of generalization and suppression by allowing a broader concept of generalization. To ensure data usefulness, we propose constraining binning by usage metrics that define maximal allowable information loss, and the metrics can be enforced off-line. Our watermarking algorithm watermarks the binned data in a hierarchical manner by leveraging on the very nature of the data. The method is resilient to the generalization attack that is specific to the binned data, as well as other attacks intended to destroy the inserted mark. We prove that watermarking could not adversely interfere with binning, and implemented the framework. Experiments were conducted, and the results show the robustness of the proposed framework.

Added 2008-04-09

Temporal hierarchies and inheritance semantics for GTRBAC

Elisa Bertino

A Generalized Temporal Role Based Access Control (GTRBAC) model that allows specification of a comprehensive set of temporal constraint for access control has recently been proposed. The model constructs allow one to specify various temporal constraints on role, user-role assignments and role-permission assignments. However, Temporal constraints on role enablings and role activations can have various implications on a role hierarchy. In this paper, we present an analysis of the effects of GTRBAC temporal constraints on a role hierarchy and introduce various kinds of temporal hierarchies. In particular, we show that there are certain distinctions that need to be made in permission inheritance and role activation semantics in order to capture all the effects of GTRBAC constraints such as role enablings and role activations on a role hierarchy.

Added 2008-04-09

Trust ngeotiations: concepts, systems, and languages

Elisa Bertino
Download: PDF

Trust negotiation is a promising approach for establishing trust in open systems like the Internet, where sensitive interactions may often occur among entities with no prior knowledge of each other. In this article, the authors present a model for trust negotiation systems, and delineate the desiderata that ideal trust negotiation systems should satisfy. In defining trust negotiation requirements, they consider two different issues, policy language requirements and system requirements. They then survey the most interesting proposals that have been presented so far and evaluate them with respect to the identified requirements. Finally, they outline future research directions and identify the open issues that still have to be explored.

Added 2008-04-09

A Framework for Data Mining Pattern Management

Elisa Bertino
Added 2008-04-09

Efficient k-Anonymization Using Clustering Techniques

Elisa Bertino
Download: PDF

k-anonymization techniques have been the focus of intense research in the last few years. An important requirement for such techniques is to ensure anonymization of data while at the same time minimizing the information loss resulting from data modifications. In this paper we propose an approach that uses the idea of clustering to minimize information loss and thus ensure good data quality. The key observation here is that data records that are naturally similar to each other should be part of the same equivalence class. We thus formulate a specific clustering problem, referred to as k-member clustering problem. We prove that this problem is NP-hard and present a greedy heuristic, the complexity of which is in O(n2). As part of our approach we develop a suitable metric to estimate the information loss introduced by generalizations, which works for both numeric and categorical data.

Added 2008-04-09