Reports and Papers Archive - Reports & Papers

The Emergent Romanian Post-Communist Ethos From Nationalism to Privatism

S Matei

Added 2008-04-08

A Sounding Board for the Self: Virtual Community as Ideology

S Matei

Claims about the emergence of a new type of social aggregation—“virtual community”—cover a type of ideological discourse about social interactions. The main cultural resource fueling this ideology is the counterculture and its social project. Virtual community, both as a discursive and as a social practice, is a culmination rather than a resolution of the modern conflict between community and individuality. Presenting virtual community as a panacea for modern social tensions, especially that between individualistic and communitarian ideals, hides from sight not only some of the negative aspects of on-line social life (cliquish behavior and incivility) but also the role played by communication technology in fragmenting modern society.

Added 2008-04-08

From Counterculture to Cyberculture: Virtual Community Discourse and the Dilemma of Modernity

S Matei

Virtual communities are discussed as expressions of the modern tension between individuality and community, emphasizing the role that counterculture and its values played in shaping the virtual community project. This article analyzes postings to the WELL conferences and the online groups that served as incubators and testing ground for the term “virtual community,” revealing how this concept was culturally shaped by the countercultural ideals of WELL users and how the tension between individualism and communitarian ideals was dealt with. The overarching conclusion is that virtual communities act both as solvent and glue in modern society, being similar to the “small group” movement.

Added 2008-04-08

The Emergence of Clusters in the Global Telecommunications Network

S Lee, P Monge, F Bar, SA Matei

Studies of international telecommunication networks in past years have found increases in density, centralization, and integration. More recent studies, however, have identified trends of decentralization and regionalization. The present research examines these structural changes in international telephone traffic among 110 countries between 1989 and 1999. It examines the competing theoretical models of core-periphery and cluster structures. The initial results show lowered centralization and inequality in the network of international telecommunications traffic. Statistical p* procedures demonstrate significant interactions within countries in blocks of similar economic development status, geographic region, and telecommunications infrastructure development status. Specifically, countries with less developed economic and telecommunications status showed significant increases in tendencies to connect to each other and to reciprocate ties. Altogether, the result supports the idea that the global telecommunications network is moving toward a more diversified structure with the emergence of cohesive and interconnected subgroups. The findings have implications for global digital divide and developmental gap issues.

Added 2008-04-08

The Impact of State-Level Social Capital on the Emergence of Virtual Communities

S Matei

Download: PDF

The paper analyzes the 48 contiguous states of the Union and their ability to create and maintain online communities (Yahoo! groups). Multiple regression analysis indicates that the number of online groups and overall amount of online activity increase with amount of social capital. Also, ethnic homogeneity positively influences the number of online groups, while population density and number of IT workers are positively associated with level of online activity. in broad terms, the analyses support the idea that the Internet strengthens offline interaction, sociability online building on sociability offline.

Added 2008-04-08

Nile-PDT: a phenomenon detection and tracking framework for data stream management systems

MH Ali, WG Aref, R Bose, AK Elmagarmid, A Helal, I Kamel, MF Mokbel

Download: PDF

In this demo, we present Nile-PDT, a Phenomenon Detection and Tracking framework using the Nile data stream management system. A phenomenon is characterized by a group of streams showing similar behavior over a period of time. The functionalities of Nile-PDT is split between the Nile server and the Nile-PDT application client. At the server side, Nile detects phenomenon candidate members and tracks their propagation incrementally through specific sensor network operators. Phenomenon candidate members are processed at the client side to detect phenomena of interest to a particular application. Nile-PDT is scalable in the number of sensors, the sensor data rates, and the number of phenomena. Guided by the detected phenomena, Nile-PDT tunes query processing towards sensors that heavily affect the monitoring of phenomenon propagation.

Added 2008-04-08

Supporting top-k join queries in relational databases

F Ilyas, G Aref, K Elmagarmid

Download: PDF

Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.

Added 2008-04-08

Exploiting predicate-window semantics over data streams

TM Ghanem, WG Aref, AK Elmagarmid

Download: PDF

The continuous sliding-window query model is used widely in data stream management systems where the focus of a continuous query is limited to a set of the most recent tuples. In this paper, we show that an interesting and important class of queries over data streams cannot be answered using the sliding-window query model. Thus, we introduce a new model for continuous window queries, termed the predicate-window query model that limits the focus of a continuous query to the stream tuples that qualify a certain predicate. Predicate-window queries have some distinguishing characteristics, e.g., (1) The window predicate can be defined over any attribute in the stream tuple (ordered or unordered). (2) Stream tuples qualify and disqualify the window predicate in an out-of-order manner. In this paper, we discuss the applicability of the predicate-window query model. We will show how the existing sliding-window query models fail to answer some of the predicate-window queries. Finally, we discuss the challenges in supporting the predicate-window query model in data stream management systems.

Added 2008-04-08

Adaptive rank-aware query optimization in relational databases

IF Ilyas, WG Aref, AK Elmagarmid, HG Elmongui, R Shah, JS Vitter

Download: PDF

Rank-aware query processing has emerged as a key requirement in modern applications. In these applications, efficient and adaptive evaluation of top-k queries is an integral part of the application semantics. In this article, we introduce a rank-aware query optimization framework that fully integrates rank-join operators into relational query engines. The framework is based on extending the System R dynamic programming algorithm in both enumeration and pruning. We define ranking as an interesting physical property that triggers the generation of rank-aware query plans. Unlike traditional join operators, optimizing for rank-join operators depends on estimating the input cardinality of these operators. We introduce a probabilistic model for estimating the input cardinality, and hence the cost of a rank-join operator. To our knowledge, this is the first effort in estimating the needed input size for optimal rank aggregation algorithms. Costing ranking plans is key to the full integration of rank-join operators in real-world query processing engines.Since optimal execution strategies picked by static query optimizers lose their optimality due to estimation errors and unexpected changes in the computing environment, we introduce several adaptive execution strategies for top-k queries that respond to these unexpected changes and costing errors. Our reactive reoptimization techniques change the execution plan at runtime to significantly enhance the performance of running queries. Since top-k query plans are usually pipelined and maintain a complex ranking state, altering the execution strategy of a running ranking query is an important and challenging task.We conduct an extensive experimental study to evaluate the performance of the proposed framework. The experimental results are twofold: (1) we show the effectiveness of our cost-based approach of integrating ranking plans in dynamic programming cost-based optimizers; and (2) we show a significant speedup (up to 300%) when using our adaptive execution of ranking plans over the state-of-the-art mid-query reoptimization strategies.

Added 2008-04-08

On local heuristics to speed up polygon-polygon intersection tests

WM Badawy, WG Aref

Download: PDF

Added 2008-04-08

SINA: scalable incremental processing of continuous queries in spatio-temporal databases

MF Mokbel, X Xiong, WG Aref

Download: PDF

This paper intoduces the Scalable INcremental hash-based Algorithm (SINA, for short); a new algorithm for evaluting a set of concurrent continuous spatio-temporal queries. SINA is designed with two goals in mind: (1) Scalability in terms of the number of concurrent continuous spatio-temporal queries, and (2) Incremental evaluation of continyous spatio-temporal queries. SINA achieves scalability by empolying a shared execution paradigm where the execution of continuous spatio-temporal queries is abstracted as a spatial join between a set of moving objects and a set of moving queries. Incremental evaluation is achived by computing only the updates of the previously reported answer. We introduce two types of updaes, namely positive and negative updates. Positive or negative updates indicate that a certain object should be added to or removed from the previously reported answer, respectively. SINA manages the computation of postive and negative updates via three phases: the hashing phase, the invalidation phase, and the joining phase. the hashing phase employs an in-memory hash-based join algorithm that results in a set a positive upldates. The invalidation phase is triggered every T seconds or when the memory is fully occupied to produce a set of negative updates. Finally, the joining phase is triggered by the end of the invalidation phase to produce a set of both positive and negative updates that result from joining in-memory data with in-disk data. Experimental results show that SINA is scalable and is more efficient than other index-based spatio-temporal algorithms.

Added 2008-04-08

Video query processing in the VDBMS testbed for video database research

W Aref, M Hammad, AC Catlin, I Ilyas, T Ghanem, A Elmagarmid, M Marzouk

Download: PDF

The increased use of video data sets for multimedia-based applications has created a demand for strong video database support, including efficient methods for handling the content-based query and retrieval of video data. Video query processing presents significant research challenges, mainly associated with the size, complexity and unstructured nature of video data. A video query processor must support video operations for search by content and streaming, new query types, and the incorporation of video methods and operators in generating, optimizing and executing query plans. In this paper, we address these query processing issues in two contexts, first as applied to the video data type and then as applied to the stream data type. We first present the query processing functionality of the VDBMS video database management system as a framework designed to support the full range of functionality for video as an abstract data type. We describe two query operators for the video data type which implement the rank-join and stop-after algorithms. As videos may be considered streams of consecutive image frames, video query processing can be expressed as continuous queries over video data streams. The stream data type was therefore introduced into the VDBMS system, and system functionality was extended to support general data streams. From this viewpoint, we present an approach for defining and processing streams, including video, through the query execution engine. We describe the implementation of several algorithms for video query processing expressed as continuous queries over video streams, such as fast forward, region-based blurring and left outer join. We include a description of the window-join algorithm as a core operator for continuous query systems, and discuss shared execution as an optimization approach for stream query processing.

Added 2008-04-08

Performance of multi-dimensional space-filling curves

MF Mokbel, WG Aref, I Kamel

Download: PDF

A space-filling curve is a way of mapping the multi-dimensional space into the one-dimensional space. It acts like a thread that passes through every cell element (or pixel) in the D-dimensional space so that every cell is visited exactly once. There are numerous kinds of space-filling curves. The difference between such curves is in their way of mapping to the one dimensional space. Selecting the appropriate curve for any application requires knowledge of the mapping scheme provided by each space-filling curve. A space-filling curve consists of a set of segments. Each segment connects two consecutive multi-dimensional points. Five different types of segments are distinguished, namely, Jump, Contiguity, Reverse, Forward, and Still. A description vector V=(J,C,R,F,S), where J,C,R,F, and S, are the percentages of Jump, Contiguity, Reverse, Forward, and Still segments in the space-filling curve, encapsulates all the properties of a space-filling curve. The knowledge of V facilitates the process of selecting the appropriate space-filling curve for different applications. Closed formulas are developed to compute the description vector V for any D-dimensional space and grid size N for different space-filling curves. A comparative study of different space filling curves with respect to the description vector is conducted and results are presented and discussed.

Added 2008-04-08

Towards scalable location-aware services: requirements and research issues

MF Mokbel, WG Aref, SE Hambrusch, S Prabhakar

Download: PDF

The emergence of location-aware services calls for new real time spatio-temporal query processing algorithms that deal with large numbers of mobile objects and queries. Online query response is an important characterization of location-aware services. A delay in the answer to a query gives invalid and obsolete results, simply because moving objects can change their locations before the query responds. To handle large numbers of spatio-temporal queries efficiently, we propose the idea of sharing as a means to achieve scalability. In this paper, we introduce several types of sharing in the context of continuous spatio-temporal queries. Examples of sharing in the context of real-time spatio-temporal database systems include sharing the execution, sharing the underlying space, sharing the sliding time windows, and sharing the objects of interest. We demonstrate how sharing can be integrated into query predicates, e.g., selection and spatial join processing. The goal of this paper is to outline research directions and approaches that will lead to scalable and efficient location-aware services.

Added 2008-04-08

The SBC-tree: an index for run-length compressed sequences

MY Eltabakh, Wing-Kai Hon, R Shah, WG Aref, JS Vitter

Download: PDF

Run-Length-Encoding (RLE) is a data compression technique that is used in various applications, e.g., time series, biological sequences, and multimedia databases. One of the main challenges is how to operate on (e.g., index, search, and retrieve) compressed data without decompressing it. In this paper, we introduce the String B-tree for C

ompressed sequences, termed the SBC-tree, for indexing and searching RLE-compressed sequences of arbitrary length. The SBC-tree is a two-level index structure based on the well-known String B-tree and a 3-sided range query structure [7]. The SBC-tree supports pattern matching queries such as substring matching, prefix matching, and range search operations over RLE-compressed sequences. The SBC-tree has an optimal external-memory space complexity of O(N/B) pages, where N is the total length of the compressed sequences, and B is the disk page size. Substring matching, prefix matching, and range search execute in an optimal O(logB N + |p|+T/B) I/O operations, where |p| is the length of the compressed query pattern and T is the query output size. The SBC-tree is also dynamic and supports insert and delete operations efficiently. The insertion and deletion of all suffixes of a compressed sequence of length m take O(m logB(N + m)) amortized I/O operations. The SBC-tree index is realized inside PostgreSQL. Performance results illustrate that using the SBC-tree to index RLE-compressed sequences achieves up to an order of magnitude reduction in storage, while retains the optimal search performance achieved by the String B-tree over the uncompressed sequences.

Added 2008-04-08