The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Reports and Papers Archive


Browse All Papers »       Submit A Paper »

A public key algorithm for ad-hoc networks

B. Carbunar, A. Grama, J. Vitek

Security is an important consideration for many applications of ad-hoc networks. While security aspects of the routing layer have been addressed extensively, there is relatively lesser work on establishing a viable public key infrastructure, which is the basis for most security protocols. We present a distributed algorithm for validating the association between the network identifier of a host and its public key without relying on a priori shared secrets or a trusted certification authority.

Added 2008-05-19

Semantic indexing in structured peer-to-peer networks

Ronaldo A. Ferreira, Mehmet Koyuturk, Suresh Jagannathan, Ananth Grama

The past few years have seen tremendous advances in distributed storage infrastructure. Unstructured and structured overlay networks have been successfully used in a variety of applications, ranging from file-sharing to scientific data repositories. While unstructured networks benefit from low maintenance overhead, the associated search costs are high. On the other hand, structured networks have higher maintenance overheads, but facilitate bounded time search of installed keywords. When dealing with typical data sets, though, it is infeasible to install every possible search term as a keyword into the structured overlay.

State-of-the art semantic indexing techniques have been successfully integrated into peer-to-peer (P2P) systems using semantic overlays. However, exiting approaches are based on the premise that the fundamental ingredient of semantic indexing, a semantic basis for the underlying data, is globally available, which is not likely to be the case in practice. Therefore, development of techniques to efficiently compute basis vectors for data distributed across peers is important for large-scale deployment of semantic indexing in P2P systems.

In this paper, we present a novel structured overlay that integrates aspects of semantic indexing using non-orthogonal matrix decompositions, with the hash structure of the overlay. We adopt PROXIMUS, a recursive decomposition method for computing concise representations for binary data sets, to locally identify latent patterns in data distributed across peers. To enable efficient consolidation of patterns, we rely on distributed hash tables (DHT), commonly used in various applications in P2P networks. The discrete nature of non-orthogonal matrix decomposition is well suited to the binary key structure of DHTs, resulting in an indexing method, PMINER, that enables the network to deliver efficient and accurate responses to semantic queries. We present the algorithmic underpinnings of PMINER and demonstrate its excellent performance characteristics on real, as well as synthetic data sets.

Added 2008-05-19

Annotating Pathways IN Interaction Networks

M.K. Ramanathan, M. Koyuturk, A. Grama
Added 2008-05-19

PHALANX: a graph-theoretic framework for test case prioritization

Murali Krishna Ramanathan, Ananth Grama, Suresh Jagannathan, Mehmet Koyuturk

Test case prioritization for regression testing can be performed using different metrics (e.g., statement coverage, path coverage) depending on the application context. Employing different metrics requires different prioritization schemes (e.g., maximum coverage, dissimilar paths covered). This results in significant algorithmic and implementation complexity in the testing process associated with various metrics and prioritization schemes. In this paper, we present a novel approach to the test case prioritization problem that addresses this limitation. We devise a framework, Phalanx, that identifies two distinct aspects of the problem. The first relates to metrics that define ordering relations among test cases; the second defines mechanisms that implement these metrics on test suites. We abstract the information into a test-case dissimilarity graph—a weighted graph in which nodes specify test cases and weighted edges specify user-defined proximity measures between test cases. We argue that a declustered linearization of nodes in the graph results in a desirable prioritization of test cases, since it ensures that dissimilar test cases are applied first. We explore two mechanisms for declustering the test case dissimilarity graph—Fiedler (spectral) ordering and a greedy approach. We implement these orderings in Phalanx, a highly flexible and customizable testbed, and demonstrate excellent performance for test-case prioritization. Our experiments on test suites available from the Subject Infrastructure Repository (SIR) show that a variety of user-defined metrics can be easily incorporated in Phalanx.

Added 2008-05-19

Impact of far-field interactions on performance of multipole-based preconditioners for sparse linear systems

Ananth Y. Grama, Vivek, Sarin

Dense operators for preconditioning sparse linear systems have traditionally been considered infeasible due to their excessive computational and memory requirements. With the emergence of techniques such as block low-rank approximations and hierarchical multipole approximations, the cost of computing and storing these preconditioners has reduced dramatically. In our prior work [15], we have demonstrated the use of multipole-based techniques as effective parallel preconditioners for sparse linear systems. At one extreme, multipole-based preconditioners behave as dense (bounded interaction) matrices (multipole degree 0), while at the other extreme, they are represented entirely as series expansions. In this paper, we show that: (i) merely truncating the kernel of the integral operator generating the preconditioner leads to poor convergence properties; (ii) far-field interactions, in the form of multipoles, are critical for rapid convergence; (iii) the importance and required accuracy of far-field interactions varies with the complexity of the problem; and (iv) the preconditioner resulting from a judicious mix of near and far-field interactions yields excellent convergence and parallelization properties. Our experimental results are illustrated on the Poisson problem and the generalized Stokes problem arising in incompressible fluid flow simulations.

Added 2008-05-19

BIOINFORMATICS

M. Koyuturk, A. Grama, W. Szpankowski
Added 2008-05-19

Level compressed DAGs for lookup tables

Ionnis Ionnidis, Ananth Grama

Trie-based data structures for implementing IP lookups have attracted considerable research attention. Techniques such as path compression, level compression, generalized level compression, and controlled prefix expansion are commonly used to implement lookup tables. In this paper, we present a fundamentally new technique that relies on directed acyclic graphs (DAGs), which, when coupled with generalized level compression, yield significantly better performance than existing techniques. Current implementations of trie-based lookup tables utilize a route validation table in addition to a trie to enable fixed-length subprefix resolution to support path compression. This route validation enables us to merge different, partially filled subtrees to form full subtrees. The resulting DAGs introduce spurious routes that are eliminated in the validation phase. When combined with level compression (and generalized level compression), this structure yields considerably shorter paths than existing approaches. In this paper, we describe a transformation of tries to DAGs, algorithms for packing subtrees, and profile performance of these algorithms and resulting improvements in lookup time. Specifically, we demonstrate, on actual lookup tables, performance gains of up to 34% compared to LC tries with minimal memory overhead (a little over 1%). Considering the fact that an LC trie is already a highly optimized structure, these gains are remarkable.

Added 2008-05-19

Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates

I. Kontoyiannis, L.A. Lyznic, L. Szpankowski, A.Y. Grama
Added 2008-05-19

Static specification inference using predicate mining

Murali Krishna Ramanathan, Ananth Grama, Suresh Jagannathan

The reliability and correctness of complex software systems can be significantly enhanced through well-defined specifications that dictate the use of various units of abstraction (e.g., modules, or procedures). Often times, however, specifications are either missing, imprecise, or simply too complex to encode within a signature, necessitating specification inference. The process of inferring specifications from complex software systems forms the focus of this paper. We describe a static inference mechanism for identifying the preconditions that must hold whenever a procedure is called. These preconditions may reflect both data flow properties (e.g., whenever p is called, variable x must be non-null) as well as control-flow properties (e.g., every call to p must bepreceded by a call to q). We derive these preconditions using a ninter-procedural path-sensitive dataflow analysis that gathers predicates at each program point. We apply mining techniques to these predicates to make specification inference robust to errors. This technique also allows us to derive higher-level specifications that abstract structural similarities among predicates (e.g., procedure p is called immediately after a conditional test that checks whether some variable v is non-null.) We describe an implementation of these techniques, and validate the effectiveness of the approach on a number of large open-source benchmarks. Experimental results confirm that our mining algorithms are efficient, and that the specifications derived are both precise and useful-the implementation discovers several critical, yet previously, undocumented preconditions for well-tested libraries.

Added 2008-05-19

MOBY-a mobile peer-to-peer service and data network

T. Horozov, A. Grama, V. Vasudevan, S. Landis

This paper describes the design and implementation of MOBY, a network for mobile peer-to-peer exchange of services and data. Constraints on computing power of mobile devices, limited hardware, networking, and software resources, and ad-hoc nature of mobile clients pose considerable challenges from the points of view of supporting performance goals, ease of service integration, and adaptation. These challenges are addressed in MOBY by dynamic service location and client mapping, surrogates for mobile clients, and standardized interfaces built upon off-the-shelf software components.

Added 2008-05-15

Inferring functional information from domain co-evolution

Yohan Kim, Mehmet Koyuturk, Umut Topkara, Ananth Grama, Shankar Subramaniam

Co-evolution is a powerful mechanism for understanding protein function. Prior work in this area has shown that co-evolving proteins are more likely to share the same function than those that do not because of functional constraints. Many of the efforts founded on this observation, however, are at the level of entire sequences, implicitly assuming that the complete protein sequence follows a single evolutionary trajectory. Since it is well known that a domain can exist in various contexts, this assumption is not valid for numerous multi-domain proteins. Motivated by these observations, we introduce a novel technique called Coevolutionary-Matrix that captures co-evolution between regions of two proteins. Instead of using existing domain information, the method exploits residue-level conservation to identify co-evolving regions that might correspond to domains.

Added 2008-05-15

Randomized Protocols for Duplicate Elimination in Peer-to-Peer Storage Systems

Ronaldo A. Ferreira, Murali K. Ramanathan, Ananth Grama, Suresh Jagannathan

Distributed peer-to-peer systems rely on voluntary participation of peers to effectively manage a storage pool. In such systems, data is generally replicated for performance and availability. If the storage associated with replication is not monitored and provisioned, the underlying benefits may not be realized. Resource constraints, performance scalability, and availability present diverse considerations. Availability and performance scalability, in terms of response time, are improved by aggressive replication, whereas resource constraints limit total storage in the network. Identification and elimination of redundant data pose fundamental problems for such systems. In this paper, we present a novel and efficient solution that addresses availability and scalability with respect to management of redundant data. Specifically, we address the problem of duplicate elimination in the context of systems connected over an unstructured peer-to-peer network in which there is no a priori binding between an object and its location. We propose two randomized protocols to solve this problem in a scalable and decentralized fashion that does not compromise the availability requirements of the application. Performance results using both large-scale simulations and a prototype built on PlanetLab demonstrate that our protocols provide high probabilistic guarantees while incurring minimal administrative overheads.

Added 2008-05-15

When being Weak is Brave: Privacy Issues in Recommender Systems

N. Ramakrishnan, B.J. Keller, B.J. Mirza, A.Y. Grama
Added 2008-05-15

Path-Sensitive Inference of Function Precedence Protocols

Murali Krishna Ramanathan, Ananth Grama, Suresh Jagannathan

Function precedence protocols define ordering relations among function calls in a program. In some instances, precedence protocols are well-understood (e.g., a call to pthread mutex init must always be present on all program paths before a call to pthread mutex lock ). Oftentimes, however, these protocols are neither well-documented, nor easily derived. As a result, protocol violations can lead to subtle errors that are difficult to identify and correct. In this paper, we present CHRONICLER, a tool that applies scalable inter-procedural path-sensitive static analysis to automatically infer accurate function precedence protocols. CHRONICLER computes precedence relations based on a program’s control-flow structure, integrates these relations into a repository, and analyzes them using sequence mining techniques to generate a collection of feasible precedence protocols. Deviations from these protocols found in the program are tagged as violations, and represent potential sources of bugs. We demonstrate CHRONICLER’s effectiveness by deriving protocols for a collection of benchmarks ranging in size from 66K to 2M lines of code. Our results not only confirm the existence of bugs in these programs due to precedence protocol violations, but also highlight the importance of path sensitivity on accuracy and scalability.

Added 2008-05-15