Reports and Papers Archive - Reports & Papers

CiVeDi: a customized virtual environment for database interaction

Elisa Bertino

This paper presents CiVedi, a scalable system providing a flexible and customizable virtual environment for displaying multimedia contents. Using CiVeDi, both the final users and the exhibition curators can personalize the content of the visit as well as the visit appearance and its duration. The proposed solution aims to be used transparently over different media objects either stored into a database or dynamically collected from online digital libraries.

Added 2008-04-21

Incremental, online, and merge mining of partial periodic patterns in time-series databases

WG Aref, MG Elfeky, AK Elmagarmid

Abstract—Mining of periodic patterns in time-series databases is an interesting data mining problem. It can be envisioned as a tool for forecasting and prediction of the future behavior of time-series data. Incremental mining refers to the issue of maintaining the discovered patterns over time in the presence of more items being added into the database. Because of the mostly append only nature of updating time-series data, incremental mining would be very effective and efficient. Several algorithms for incremental mining of partial periodic patterns in time-series databases are proposed and are analyzed empirically. The new algorithms allow for online adaptation of the thresholds in order to produce interactive mining of partial periodic patterns. The storage overhead of the incremental online mining algorithms is analyzed. Results show that the storage overhead for storing the intermediate data structures pays off as the incremental online mining of partial periodic patterns proves to be significantly more efficient than the nonincremental nononline versions. Moreover, a new problem, termed merge mining, is introduced as a generalization of incremental mining. Merge mining can be defined as merging the discovered patterns of two or more databases that are mined independently of each other. An algorithm for merge mining of partial periodic patterns in time-series databases is proposed and analyzed.

Added 2008-04-21

Scalability Management in Sensor-Network PhenomenaBases

MH Ali, WG Aref, I Kamel

A phenomenon appears in a sensor network when a group of sensors persist to generate similar behavior over a period of time. PhenomenaBases (or databases of phenomena) are equipped with Phenomena Detection and Tracking (PDT) techniques that continuously run in the background of a sensor database system to detect new phenomena and to track already existing phenomena. The process of phenomena detection and tracking is initiated by a multi-way join operator that comes at the core of PDT techniques to report similar sensor readings. With the increase in the sensor network size, the join operator (and, consequently, query processing in the PhenomenaBase) face several scalability challenges. In this paper, we present a join operator for PhenomenaBases (the SNJoin operator) that is specially-designed for dynamically-configured large-scale sensor networks with distributed processing capabilities. Experimental studies illustrate the scalability of the proposed join operator in PhenomenaBases with respect to the number of detected phenomena and the output delay.

Added 2008-04-21

X-FEDERATE: A policy engineering framework for federated access management

Elisa Bertino

Added 2008-04-21

Search-based buffer management policies for streaming in continuous media servers

MA Hammad, WG Aref, AK Elmagarmid, A.K

In this paper we propose efficient buffer prefetching and replacement policies for continuous-media servers that support content-based search and retrieval. The new policies are based on the knowledge collected from the content-based search manager and the streaming manager. We show that by integrating the knowledge from the search and streaming components, we can achieve better caching of media streams, thus minimizing initial latency and reducing disk I/O. We test the search-based policies on a prototype video database system developed at Purdue University. The results show that initial latency is reduced on the average by 20%, compared to the traditional policies.

Added 2008-04-21

Query Indexing and Velocity Constrained Indexing: Scalable Techniques For Continuous Queries on Moving Objects

S Prabhakar, Y Xia, D Kalashnikov, WG Aref, S Hambrusch

Moving object environments are characterized by large numbers of moving objects and numerous concurrent continuous queries over these objects. Efficient evaluation of these queries in response to the movement of the objects is critical for supporting acceptable response times. In such environments the traditional approach of building an index on the objects (data) suffers from the need for frequent updates and thereby results in poor performance. In fact, a brute force, no-index strategy yields better performance in many cases. Neither the traditional approach, nor the brute force strategy achieve reasonable query processing times. This paper develops novel techniques for the efficient and scalable evaluation of multiple continuous queries on moving objects. Our solution leverages two complimentary techniques: Query Indexing and Velocity Constrained Indexing (VCI). Query Indexing relies on i) incremental evaluation; ii) reversing the role of queries and data; and iii) exploiting the relative locations of objects and queries. VCI takes advantage of the maximum possible speed of objects in order to delay the expensive operation of updating an index to reflect the movement of objects. In contrast to an earlier technique [29] that requires exact knowledge about the movement of the objects, VCI does not rely on such information. While Query Indexing outperforms VCI, it does not efficiently handle the arrival of new queries. Velocity constrained indexing, on the other hand, is unaffected by changes in queries. We demonstrate that a combination of Query Indexing and Velocity Constrained Indexing enables the scalable execution of insertion and deletion of queries in addition to processing ongoing queries. We also develop several optimizations and present a detailed experimental evaluation of our techniques. The experimental results show that the proposed schemes outperform the traditional approaches by almost two orders of magnitude.

Added 2008-04-21

Modeling and language support for the management of pattern-bases

Elisa Bertino, Manolis terrovitis, Panos Vassiliadis, Spiros Skiadopoulos, Barbara Catania, Anna Maddalena and Stefano Rizzi

Information overloading is today a serious concern that may hinder the potential of modern web-based information systems. A promising approach to deal with this problem is represented by knowledge extraction methods able to produce artifacts (also called patterns) that concisely represent data. Patterns are usually quite heterogeneous and voluminous. So far, little emphasis has been posed on developing an overall integrated environment for uniformly representing and querying different types of patterns. In this paper we consider the larger problem of modeling, storing, and querying patterns, in a database-like setting and use a Pattern-Base Management System (PBMS) for this purpose. Specifically, (a) we formally define the logical foundations for the global setting of pattern management through a model that covers data, patterns, and their intermediate mappings; (b) we present a formalism for pattern specification along with safety restrictions; and (c) we introduce predicates for comparing patterns and query operators.

Added 2008-04-21

Discovering Consensus Patterns in Biological Databases

MY ElTabakh, WG Aref, M Ouzzani, MH Ali

Consensus patterns, like motifs and tandem repeats, are highly conserved patterns with very few substitutions where no gaps are allowed. In this paper, we present a progressive hierarchical clustering technique for discovering consensus patterns in biological databases over a certain length range. This technique can discover consensus patterns with various requirements by applying a post-processing phase. The progressive nature of the hierarchical clustering algorithm makes it scalable and efficient. Experiments to discover motifs and tandem repeats on real biological databases show significant performance gain over non-progressive clustering techniques.

Added 2008-04-21

Automatic image segmentation by integrating color-edge extraction and seeded

J Fan, DKY Yau, AK Elmagarmid, WG Aref

We propose a new automatic image segmentation method. Color edges in an image are first obtained automatically by combining an improved isotropic edge detector and a fast entropic thresholding technique. After the obtained color edges have provided the major geometric structures in an image, the centroids between these adjacent edge regions are taken as the initial seeds for seeded region growing (SRG). These seeds are then replaced by the centroids of the generated homogeneous image regions by incorporating the required additional pixels step by step. Moreover, the results of color-edge extraction and SRG are integrated to provide homogeneous image regions with accurate and closed boundaries. We also discuss the application of our image segmentation method to automatic face detection. Furthermore, semantic human objects are generated by a seeded region aggregation procedure which takes the detected faces as object seeds.

Added 2008-04-21

STAGGER: Periodicity Mining of Data Streams Using Expanding Sliding Windows

MG Elfeky, WG Aref, AK Elmagarmid

Sensor devices are becoming ubiquitous, especially in measurement and monitoring applications. Because of the real-time, append-only and semi-infinite natures of the generated sensor data streams, an online incremental approach is a necessity for mining stream data types. In this paper, we propose STAGGER: a one-pass, online and incremental algorithm for mining periodic patterns in data streams. STAGGER does not require that the user pre-specify the periodicity rate of the data. Instead, STAGGER discovers the potential periodicity rates. STAGGER maintains multiple expanding sliding windows staggered over the stream, where computations are shared among the multiple overlapping windows. Small-length sliding windows are imperative for early and real-time output, yet are limited to discover short periodicity rates. As streamed data arrives continuously, the sliding windows expand in length in order to cover the whole stream. Larger-length sliding windows are able to discover longer periodicity rates. STAGGER incrementally maintains a tree-like data structure for the frequent periodic patterns of each discovered potential periodicity rate. In contrast to the Fourier/Wavelet-based approaches used for discovering periodicity rates, STAGGER not only discovers a wider, more accurate set of periodicities, but also discovers the periodic patterns themselves. In fact, experimental results with real and synthetic data sets show that STAGGER outperforms Fourier/Wavelet-based approaches by an order of magnitude in terms of the accuracy of the discovered periodicity rates. Moreover, realdata experiments demonstrate the practicality of the discovered periodic patterns.

Added 2008-04-21

Security and Privacy Challenges of A Digital Government

J Joshi, A Ghafoor, W Aref, E Spafford

A digital government can be viewed as an amalgam of heterogeneous information systems that exchange high-volume information among government agencies and public and private sectors engaged in government business. This gives rise to several daunting multidomain security challenges as well as concern for citizen privacy. The success of a digital government infrastructure depends on how well it meets these challenges and its preparedness against numerous potential threats ranging from simple act of hacking to cyber-terrorism. In this chapter, we outline these crucial security and privacy issues and present various solutions that are available and need to be further investigated.

Added 2008-04-21

ARUBA: A Risk-Utility-Based Algorithm for Data Disclosure

CERIAS TR 2008-27

Mohamed R. Fouad, Guy Lebanon, and Elisa Bertino

Download: PDF

Dealing with sensitive data has been the focus of much of recent research. On one hand data disclosure may incur some risk due to security breaches, but on the other hand data sharing has many advantages. For example, revealing customer transactions at a grocery store may be beneficial when studying purchasing patterns and market demand. However, a potential misuse of the revealed information may be harmful due to privacy violations. In this paper we study the tradeoff between data disclosure and data retention. Specifically, we address the problem of minimizing the risk of data disclosure while maintaining its utility above a certain acceptable threshold. We formulate the problem as a discrete optimization problem and leverage the special monotonicity characteristics for both risk and utility to construct an efficient algorithm to solve it. Such an algorithm determines the optimal transformations that need to be performed on the microdata before it gets released. These optimal transformations take into account both the risk associated with data disclosure and the benefit of it (referred to as utility). Through extensive experimental studies we compare the performance of our proposed algorithm with other date disclosure algorithms in the literature in terms of risk, utility, and time. We show that our proposed framework outperforms other techniques for sensitive data disclosure.

Added 2008-04-21

A Formal Language for Specifying Policy Combining Algorithms in Access Control

CERIAS TR 2008-9

Ninghui Li, Qihua Wang, Prathima Rao, Dan Lin, Elisa Bertino, Jorge Lobo

Download: PDF

Many access control policy languages, e.g., XACML, allow a policy to ontain multiple sub-policies, and the result of the policy on a request is determined by combining the results of the sub-policies according to some policy combining algorithms (PCAs). Existing access control policy languages, however, do not provide a formal language for specifying PCAs. As a result, it is difficult to extend them with new PCAs. The lacking of a formal approach also makes the design of combining algorithms in XACML plagued with issues and subtleties that can be confusing and surprising for policy authors. Motivated by the need to provide a flexible and user-friendly mechanism for specifying PCAs, we propose a policy combining language PCL, which can succinctly and precisely express a variety of PCAs. We show that our approach avoids the pitfalls of XACML and that it is expressive enough to express both PCAs in XACML and other natural PCAs. A policy evaluation engine only needs to understand PCL to evaluate any PCA specified in it. In particular, we model the evaluation of PCAs using finite state automata. Using techniques from automata theory, we also develop systematic policy evaluation optimization techniques that improve evaluation efficiency.

Added 2008-04-15

XACML policy integration algorithms: not to be confused with XACML policy combination algorithms

Elisa Bertino

XACML is the OASIS standard language for the specification of authorization and entitlement policies. However, while XACML well addresses security requirements of a single enterprise (even if large and composed by multiple departments), it does not address the requirements of virtual enterprises built through collaboration of several autonomous subjects sharing their resources. In this paper we highlight such limitations and we propose an XACML extension, the policy integration algorithm, to address them. In the paper we also discuss in which respect the process of comparing two XACML policies differs from the process used to compare other business rules.

Added 2008-04-14

An approach to cooperative updates of XML documents in distributed systems

Elisa Bertino

Protection and secure exchange of Web documents is becoming a crucial need for many Internet-based applications. Securing Web documents entail addressing two main issues: confidentiality and integrity. Ensuring document confidentiality means that document contents can only be disclosed to subjects authorized according to specified security policies, whereas by document integrity we mean that the document contents are correct with respect to a given application domain and that the document contents are modified only by authorized subjects. While the problem of document confidentiality has been widely investigated in the literature, the problem of how to ensure that a document, when moving among different parties, is modified only according to the stated policies still lacks comprehensive solutions. In this paper we present a solution to this problem by proposing a model for specifying update policies, and an infrastructure supporting the specification and enforcement of these policies in a distributed and cooperative environment, in which subjects in different organizational roles can modify possibly different portions of the same document. The key aspect of our proposal is that, by using a combination of hash functions and digital signature techniques, we create a distributed environment that enables subjects, in most cases, to verify, upon receiving a document, whether the update operations performed on the document up to that point are correct with respect to the update policies, without interacting with the document server. Our approach is particularly suited for environments, such as mobile systems, pervasive systems, decentralized workflows, and peer-to-peer systems.

Added 2008-04-14