Reports and Papers Archive - Reports & Papers

Sieve: A Tool for Automatically Detecting Variations Across Program Versions

Murali Krishna Ramanathan, Ananth Grama, Suresh Jagannathan

Software systems often undergo many revisions during their lifetime as new features are added, bugs repaired, abstractions simplified and refactored, and performance improved. When a revision, even a minor one, does occur, the changes it induces must be tested to ensure that invariants assumed in the original version are not violated unintentionally. In order to avoid testing components that are unchanged across revisions, impact analysis is often used to identify code blocks or functions that are affected by a change. In this paper, we present a novel solution to this general problem that uses dynamic programming on instrumented traces of different program binaries to identify longest common subsequences in strings generated by these traces. Our formulation allows us to perform impact analysis and also to detect the smallest set of locations within the functions where the effect of the changes actually manifests itself. Sieve is a tool that incorporates these ideas. Sieve is unobtrusive, requiring no programmer or compiler intervention to guide its behavior. Our experiments on multiple versions of op ensource C programs shows that Sieve is an effective and scalable tool to identify impact sets and can locate regions in the affected functions where the changes manifest. These results lead us to conclude that Sieve can play a beneficial role in program testing and software maintenance

Added 2008-05-15

Trace-Based Memory Aliasing Across Program Versions

Murali Krishna Ramanathan, Ananth Grama, Suresh Jagannathan

One of the major costs of software development is associated with testing and validation of successive versions of software systems. An important problem encountered in testing and validation is memory aliasing, which involves correlation of variables across program versions. This is useful to ensure that existing invariants are preserved in newer versions and to match program execution histories. Recent work in this area has focused on trace-based techniques to better isolate affected regions. A variation of this general approach considers memory operations to generate more refined impact sets. The utility of such an approach eventually relies on the ability to effectively recognize aliases. In this paper, we address the general memory aliasing problem and present a probabilistic trace-based technique for correlating memory locations across execution traces, and associated variables in program versions. Our approach is based on computing the log-odds ratio, which defines the affinity of locations based on observed patterns. As part of the aliasing process, the traces for initial test inputs are aligned without considering aliasing. From the aligned traces, the log-odds ratio of the memory locations is computed. Subsequently, aliasing is used for alignment of successive traces. Our technique can easily be extended to other applications where detecting aliasing is necessary. As a case study, we implement and use our approach in dynamic impact analysis for detecting variations across program versions. Using detailed experiments on real versions of software systems, we observe significant improvements in detection of affected regions when aliasing occurs.

Added 2008-05-15

Spectral LPM: an optimal locality-preserving mapping using the spectral (not fractal) order

M.F. Mokbel, W.G. Aref, A. Grama

For the past two decades, fractals (e.g., the Hilbert and Peano space-filling curves) have been considered the natural method for providing a locality-preserving mapping. The idea behind a locality-preserving mapping is to map points that are nearby in the multidimensional space into points that are nearby in the one-dimensional space. We argue against the use of fractals in locality-preserving mapping algorithms, and present examples with experimental evidence to show why fractals produce poor locality-preserving mappings. In addition, we propose an optimal locality-preserving mapping algorithm, termed the spectral locality-preserving mapping algorithm (Spectral LPM, for short), that makes use of the spectrum of the multidimensional space. We give a mathematical proof for the optimality of Spectral LPM, and also demonstrate its practical use.

Added 2008-05-15

Functional annotation of regulatory pathways

Jayesh Pandey, Mehmet Koyuturk, Yohan Kim, Wojciech Szpankowski, Shankar Subramaniam, Ananth Grama

Added 2008-05-15

Parallel Algorithm Scalability Issues in Petaflops Architectures

A. Grama, A. Gupta, E.H. Han, V. Kumar

Added 2008-05-15

Algorithms for bounded-error correlation of high dimensional data in microarray experiments

M. Koyuturk, A. Grama, W. Szpankowski

Added 2008-05-15

The Omni Macroprogramming Environment for Sensor Networks

Asad Awan, Ahmed Sameh, Ananth Grama

Structural sensing and control is an important application of the DDDAS paradigm. Our work on structural sensing and control has several key aspects, including model reduction, control, simulation, and validation. Motivated by our work on validation using an actual three-storeyed structure, we are developing a comprehensive systems environment, Omni, for macroprogramming sensor networks. While there have been efforts targeted at enabling programmers to write lean applications for individual sensor nodes, there have been few efforts targeted towards allowing programmers to program entire networks as distributed ensembles. Omni provides an intuitive and efficient programming interface, along with operating system services for mapping these abstractions into the underlying network. In this paper, we provide a high-level overview of the Omni architecture, its salient features, and implementation details. The Omni architecture is designed to be a flexible, extensible, scalable, and portable system, upon which a wide variety of DDDAS applications can be built.

Added 2008-05-15

Building Verifiable Sensing Applications Through Temporal Logic Specification

Asad Awan, Ahmed Sameh, Suresh Jagannathan, Ananth Grama

Sensing is at the core of virtually every DDDAS application. Sensing applications typically involve distributed communication and coordination over large self-organized networks of heterogeneous devices with severe resource constraints. As a consequence, developers must explicitly deal with low-level details, making programming time-consuming and error-prone. To reduce this burden, current sensor network programming languages espouse a model that relies on packaged reusable components to implement relevant pieces of a distributed communication infrastructure. Unfortunately, programmers are often forced to understand the mechanisms used by these implementations in order to optimize resource utilization and performance, and to ensure application requirements are met. To address these issues, we propose a novel and high-level programming model that directly exposes control over sensor network behavior using temporal logic specifications, in conjunction with a set of system state abstractions to specify, generate, and automatically validate resource and communication behavior for sensor network applications. TLA+ (the temporal logic of actions) is used as the underlying specification language to express global state abstractions as well as user invariants. We develop a synthesis engine that utilizes TLC (a temporal logic model-checker) to generate detailed actions so that user-provided behavioral properties can be satisfied, guaranteeing program correctness. The synthesis engine generates specifications in TLA+, which are compiled down to sensor node primitive actions. We illustrate our model using a detailed experimental evaluation on our structural sensing and control testbed. The proposed framework is integrated into the COSMOS macroprogramming environment, which is extensively used to develop sensing and control applications at the Bowen Lab for Structural Engineering at Purdue.

Added 2008-05-15

Semi-Discrete Matrix Transforms (SDD) for Image and Video Compression

Sacha Zyto, Ananth Grama, Wojciech Szpankowski

Added 2008-05-14

Biclustering Gene-Feature Matrices for Statistically Significant Dense Patterns

Mehmet Koyuturk, Wojciech Szpankowski, Ananth Grama

Biclustering is an important problem that arises in diverse applications, including analysis of gene expression and drug interaction data. The problem can be formalized in various ways through different interpretation of data and associated optimization functions. We focus on the problem of finding unusually dense patterns in binary (0-1) matrices. This formulation is appropriate for analyzing experimental datasets that come from not only binary quantization of gene expression data, but also more comprehensive datasets such as gene-feature matrices that include functions of coded proteins and motifs in the coding sequence. We formalize the notion of an “unusually” dense submatrix to evaluate the interestingness of a pattern in terms of statistical significance based on the assumption of a uniform memoryless source. We then simplify it to assess statistical significance of discovered patterns. Using statistical significance as an objective function, we formulate the problem as one of finding significant dense submatrices of a large sparse matrix. Adopting a simple iterative heuristic along with randomized initialization techniques, we derive fast algorithms for discovering binary biclusters. We conduct experiments on a binary gene-feature matrix and a quantized breast tumor gene expression matrix. Our experimental results show that the proposed method quickly discovers all interesting patterns in these datasets.

Added 2008-05-14

Algebraic Techniques for Analysis of Large Discrete-Valued Datasets

Mehmet Koyuturk, Ananth Grama, Naren Ramakrishnan

With the availability of large scale computing platforms and instrumentation for data gathering, increased emphasis is being placed on efficient techniques for analyzing large and extremely high-dimensional datasets. In this paper, we present a novel algebraic technique based on a variant of semi-discrete matrix decomposition (SDD), which is capable of compressing large discrete-valued datasets in an error bounded fashion. We show that this process of compression can be thought of as identifying dominant patterns in underlying data. We derive efficient algorithms for computing dominant patterns, quantify their performance analytically as well as experimentally, and identify applications of these algorithms in problems ranging from clustering to vector quantization.We demonstrate the superior characteristics of our algorithm in terms of (i) scalability to extremely high dimensions; (ii) bounded error; and (iii) hierarchical nature, which enables multiresolution analysis. Detailed experimental results are provided to support these claims.

Added 2008-05-14

Mining Scientific Data

N. Ramakrishnan, A. Grama

Added 2008-05-14

Locality in structured peer-to-peer networks

Ronaldo A. Ferreira, Suresh Jagannathan, Ananth Grama

Distributed hash tables (DHTs), used in a number of structured peer-to-peer (P2P) systems, provide efficient mechanisms for resource placement and location. A key distinguishing feature of current DHT systems, such as Chord, Pastry, CAN and Tapestry, is the way they handle locality in the underlying network. Topology-based node identifier assignment, proximity routing, and proximity neighbor selection are examples of heuristics used to minimize message delays in the underlying network. While these heuristics are sometimes effective, they all rely on a single global overlay that may install the key of a popular object at a node far from most of the nodes accessing it. Furthermore, a response to a lookup message does not contain any locality information about the nodes holding a copy of the object. We address these issues in Plethora, a novel two-level overlay P2P network. A local overlay in Plethora acts as a locality-aware cache for the global overlay, grouping nodes close together in the underlying network. Local overlays are constructed by exploiting the organization of the Internet into autonomous systems (ASs). We present a detailed experimental study that demonstrates performance gains in response time of up to 60% compared to a single global Pastry overlay. We also present efficient distributed algorithms for maintaining local overlays in the presence of node arrivals and departures.

Added 2008-05-14

2D-Pattern Matching Image and Video Compression Theory, Algorithms, and Experiments

Marc Alzina, Wojciech Szpankowski, Ananth Grama

Added 2008-05-14

Enhancing locality in structured peer-to-peer networks

R.A. Ferreira, S. Jagannathan, A. Grama

Distributed hash tables (DHTs), used in a number of structured peer-to-peer systems, provide efficient mechanisms for resource location. A key distinguishing feature of current DHT systems such as Chord, Pastry, and Tapestry is the way they handle locality in the underlying network. Topology-based node identifier assignment, proximity routing, and proximity neighbor selection are examples of heuristics used to minimize message delays in the underlying network. While these heuristics are sometimes effective, they rely on a single global overlay that may install the key of a popular object at a node far from most of the nodes accessing it. Furthermore, a response to a lookup does not contain any locality information about the nodes holding a copy of the object. We address these issues by proposing a novel two-level overlay peer-to-peer architecture. In our architecture, local overlays act as locality-aware caches for the global overlay, grouping nodes close together in the underlying network. Local overlays are constructed by exploiting the structure of the Internet as autonomous systems. We present detailed experimental results demonstrating the practicality of the system, and showing performance gains in response time of up to 60% compared to a single global overlay with state-of-the-art localization schemes. We also present efficient distributed algorithms for maintaining local overlays in the presence of node arrivals and departures.

Added 2008-05-14