Reports and Papers Archive - Reports & Papers

Bounded-Error Compression of Particle Data from Hierarchical Approximate Methods

Dow-Yung Yang, Ananth Y. Grama, Vivek Sarin

This paper presents an analytical and computational framework for the compression of particle data resulting from hierarchical approximate treecodes such as the Barnes-Hut and Fast Multipole Methods. Due to the approximations introduced by hierarchical methods, the position (as well as velocity and acceleration) of a particle can be bounded by a distortion radius. We develop storage schemes that maintain this distortion radii while maximizing compression. Our schemes make extensive use of spatial and temporal coherence of particle behavior and yield compression ratios higher than 12:1 over raw data, and 6:1 over gzipped (LZ78) raw data. We demonstrate that for uniform distributions with 100K particles, storage requirements can be reduced from 1200KB (100K Ã— 12B) to about 99KB (under 1 byte per particle per timestep). This is significant because it enables faster storage/retrieval, better temporal resolution, and improved analysis. Our results are shown to scale from small systems (2K particles) to much larger systems (over 100K particles). The associated algorithm is optimal (O(n)) in both storage and computation with small constants.

Added 2008-05-15

Multipole-based preconditioners for large sparse linear systems

Sreekanth R. Sambavaram, Vivek Sarin, Ahmed Sameh, Ananth Grama

Dense operators for preconditioning sparse linear systems have traditionally been considered infeasible due to their excessive computational and memory requirements. With the emergence of techniques such as block low-rank approximations and hierarchical multipole approximations, the cost of computing and storing these preconditioners has reduced dramatically. This paper describes the use of multipole operators as parallel preconditioners for sparse linear systems. Hierarchical multipole approximations of explicit Greenâ€™s functions are effective preconditioners due to their bounded-error properties. By enumerating nodes in proximity preserving order, one can achieve high parallel efficiency in computing matrixâ€“vector products with these dense preconditioners. The benefits of the approach are illustrated on the Poisson problem and the generalized Stokes problem arising in incompressible fluid flow simulations. Numerical experiments show that the multipole-based techniques are effective preconditioners that can be parallelized efficiently on multiprocessing platforms.

Added 2008-05-15

Architecture Independent Analysis of Parallel Programs

Ananth Grama, Vipin Kumar, Sanjay Ranka, Vineet Singh

The presence of a universal machine model for serial algorithm design, namely the von Neumann model, has been one of the key ingredients of the success of uniprocessors. The presence of such a model makes it possible for an algorithm to be ported across a wide range of uniprocessors efficiently. Universal algorithm design models proposed for parallel computers however tend to be limited in the range of parallel platforms they can efficiently cover. Consequently, portability of parallel programs is attained at the expense of loss of efficiency. In this paper, we explore desirable and attainable properties of universal models of architecture independent parallel program design. We study various models that have been proposed, classify them based on important machine parameters and study their limitations.

Added 2008-05-15

Randomized leader election

Murali Krishna Ramanathan, Ronaldo A. Ferreira, Suresh Jagannathan, Ananth Grama, Wojciech Szpankowski

We present an efficient randomized algorithm for leader election in large-scale distributed systems. The proposed algorithm is optimal in message complexity (O(n) for a set of n total processes), has round complexity logarithmic in the number of processes in the system, and provides high probabilistic guarantees on the election of a unique leader. The algorithm relies on a balls and bins abstraction and works in two phases. The main novelty of the work is in the first phase where the number of contending processes is reduced in a controlled manner. Probabilistic quorums are used to determine a winner in the second phase. We discuss, in detail, the synchronous version of the algorithm, provide extensions to an asynchronous version and examine the impact of failures.

Added 2008-05-15

Plethora: An Efficient Wide-Area Storage System

Ronaldo A. Ferreira, Ananth Grama, Suresh Jagannathan

Trends in conventional storage infrastructure motivate the development of foundational technologies for building a wide-area read-write storage repository capable of providing a single image of a distributed storage resource. The overarching design goals of such an infrastructure include client performance, global resource utilization, system scalability (providing a single logical view of larger resource and user pools) and application scalability (enabling single applications with large resource requirements). Such a storage infrastructure forms the basis for second generation data-grid efforts underlying massive data handling in high-energy physics, nanosciences, and bioinformatics, among others. This paper describes some of the foundational technologies underlying such a repository, Plethora, for semi-static peer-to-peer (P2P) networks implemented on a wide-area Internet testbed. In contrast to many current efforts that focus entirely on unstructured dynamic P2P environments, Plethora focuses on semi-static peers with strong network connectivity and a partially persistent network state. In a semi-static P2P network, peers are likely to remain participants in the network over long periods of time (e.g., compute servers), and are capable of providing reasonably high availability and response-time guarantees. The repository integrates novel concepts in locality enhancing overlay networks, transactional semantics for read-write data coupled with hierarchical versioning, and novel erasure codes for robustness. While mentioning approaches taken by Plethora to other problems, this paper focuses on the problem of routing data request to blocks, while integrating caching and locality enhancing overlays into a single framework. We show significant performance improvements resulting from our routing techniques.

Added 2008-05-15

Extended Consistent Hashing: A Framework for Distrubted Servers

S. Lei, A. Grama

Added 2008-05-15

Compression of particle data from hierarchical approximate methods

Dow-Yung Yang, Ananth Grama, Vivek Sarin, Naren Ramakrishnan

This article presents an analytical and computational framework for the compression of particle data resulting from hierarchical approximate treecodes such as the Barnes—Hut and Fast Multipole Methods. Due to approximations introduced by hierarchical methods, various parameters (such as position, velocity, acceleration, potential) associated with a particle can be bounded by distortion radii. Using this distortion radii, we develop storage schemes that guarantee error bounds while maximizing compression. Our schemes make extensive use of spatial and temporal coherence of particle behavior and yield compression ratios higher than 12:1 over raw data, and 6:1 over gzipped (LZ) raw data for selected simulation instances. We demonstrate that for uniform distributions with 2M particles, storage requirements can be reduced from 24 MB to about 1.8 MB (about 7 bits per particle per timestep) for storing particle positions. This is significant because it enables faster storage/retrieval, better temporal resolution, and improved analysis. Our results are shown to scale from small systems (2K particles) to much larger systems (over 2M particles). The associated algorithm is asymptotically optimal in computation time (O(n)) with a small constant. Our implementations are demonstrated to run extremely fast—-much faster than the time it takes to compute a single time-step advance. In addition, our compression framework relies on a natural hierarchical representation upon which other analysis tasks such as segmented and window retrieval can be built.

Added 2008-05-15

An IP address based caching scheme for peer-to-peer networks

R.A. Ferreira, A. Grama, S. Jagannathan

Distributed hash tables (DHTs), used in a number of current peer-to-peer systems, provide efficient mechanisms for resource location. Systems such as Chord, Pastry, CAN, and Tapestry provide strong guarantees that queries in the overlay network can be resolved in a bounded number of overlay hops, while preserving load balance among the peers. A key distinction in these systems is the way they handle locality in the underlying network. Topology-based node identifier assignment, proximity routing, and proximity neighbor selection are examples of heuristics used to minimize message delays in the underlying network. We investigate the use of source IP addresses to enhance locality in overlay networks based on DHTs. We first show that a naive use of source IP address potentially leads to severe resource imbalance due to nonuniformity of peers over the IP space. We then present an effective caching scheme that combines a segment of the source IP with the queried hash-code to localize access and affect replication effectively. Using detailed experiments, we show that this scheme achieves performance gains of up to 41%, when compared to Pastry in combination with the proximity neighbor selection heuristic.

Added 2008-05-15

Sieve: A Tool for Automatically Detecting Variations Across Program Versions

Murali Krishna Ramanathan, Ananth Grama, Suresh Jagannathan

Software systems often undergo many revisions during their lifetime as new features are added, bugs repaired, abstractions simplified and refactored, and performance improved. When a revision, even a minor one, does occur, the changes it induces must be tested to ensure that invariants assumed in the original version are not violated unintentionally. In order to avoid testing components that are unchanged across revisions, impact analysis is often used to identify code blocks or functions that are affected by a change. In this paper, we present a novel solution to this general problem that uses dynamic programming on instrumented traces of different program binaries to identify longest common subsequences in strings generated by these traces. Our formulation allows us to perform impact analysis and also to detect the smallest set of locations within the functions where the effect of the changes actually manifests itself. Sieve is a tool that incorporates these ideas. Sieve is unobtrusive, requiring no programmer or compiler intervention to guide its behavior. Our experiments on multiple versions of op ensource C programs shows that Sieve is an effective and scalable tool to identify impact sets and can locate regions in the affected functions where the changes manifest. These results lead us to conclude that Sieve can play a beneficial role in program testing and software maintenance

Added 2008-05-15

Trace-Based Memory Aliasing Across Program Versions

Murali Krishna Ramanathan, Ananth Grama, Suresh Jagannathan

One of the major costs of software development is associated with testing and validation of successive versions of software systems. An important problem encountered in testing and validation is memory aliasing, which involves correlation of variables across program versions. This is useful to ensure that existing invariants are preserved in newer versions and to match program execution histories. Recent work in this area has focused on trace-based techniques to better isolate affected regions. A variation of this general approach considers memory operations to generate more refined impact sets. The utility of such an approach eventually relies on the ability to effectively recognize aliases. In this paper, we address the general memory aliasing problem and present a probabilistic trace-based technique for correlating memory locations across execution traces, and associated variables in program versions. Our approach is based on computing the log-odds ratio, which defines the affinity of locations based on observed patterns. As part of the aliasing process, the traces for initial test inputs are aligned without considering aliasing. From the aligned traces, the log-odds ratio of the memory locations is computed. Subsequently, aliasing is used for alignment of successive traces. Our technique can easily be extended to other applications where detecting aliasing is necessary. As a case study, we implement and use our approach in dynamic impact analysis for detecting variations across program versions. Using detailed experiments on real versions of software systems, we observe significant improvements in detection of affected regions when aliasing occurs.

Added 2008-05-15

Spectral LPM: an optimal locality-preserving mapping using the spectral (not fractal) order

M.F. Mokbel, W.G. Aref, A. Grama

For the past two decades, fractals (e.g., the Hilbert and Peano space-filling curves) have been considered the natural method for providing a locality-preserving mapping. The idea behind a locality-preserving mapping is to map points that are nearby in the multidimensional space into points that are nearby in the one-dimensional space. We argue against the use of fractals in locality-preserving mapping algorithms, and present examples with experimental evidence to show why fractals produce poor locality-preserving mappings. In addition, we propose an optimal locality-preserving mapping algorithm, termed the spectral locality-preserving mapping algorithm (Spectral LPM, for short), that makes use of the spectrum of the multidimensional space. We give a mathematical proof for the optimality of Spectral LPM, and also demonstrate its practical use.

Added 2008-05-15

Functional annotation of regulatory pathways

Jayesh Pandey, Mehmet Koyuturk, Yohan Kim, Wojciech Szpankowski, Shankar Subramaniam, Ananth Grama

Added 2008-05-15

Parallel Algorithm Scalability Issues in Petaflops Architectures

A. Grama, A. Gupta, E.H. Han, V. Kumar

Added 2008-05-15

Algorithms for bounded-error correlation of high dimensional data in microarray experiments

M. Koyuturk, A. Grama, W. Szpankowski

Added 2008-05-15

The Omni Macroprogramming Environment for Sensor Networks

Asad Awan, Ahmed Sameh, Ananth Grama

Structural sensing and control is an important application of the DDDAS paradigm. Our work on structural sensing and control has several key aspects, including model reduction, control, simulation, and validation. Motivated by our work on validation using an actual three-storeyed structure, we are developing a comprehensive systems environment, Omni, for macroprogramming sensor networks. While there have been efforts targeted at enabling programmers to write lean applications for individual sensor nodes, there have been few efforts targeted towards allowing programmers to program entire networks as distributed ensembles. Omni provides an intuitive and efficient programming interface, along with operating system services for mapping these abstractions into the underlying network. In this paper, we provide a high-level overview of the Omni architecture, its salient features, and implementation details. The Omni architecture is designed to be a flexible, extensible, scalable, and portable system, upon which a wide variety of DDDAS applications can be built.

Added 2008-05-15