The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Reports and Papers Archive


Browse All Papers »       Submit A Paper »

Privacy-Preserving Database Systems

Elisa Bertino, Ji-Won Byun, Ninghui Li

Privacy is today an important concern for both users and enterprises. Therefore, intense research is today being carried out on various aspects of privacy-preserving data management systems. In this paper, we focus on database management systems (DBMS) able to enforce privacy promises encoded in privacy languages such as P3P. In particular, in the paper, we first present an overview of the P3P language and outlines some of its critical aspects. We then outline the main requirements for a privacy-preserving DBMS and we discuss solutions related to the management of privacy-related meta-data, focusing on special category of meta-data information, that is, purpose information. Purpose information represents an important component of privacy statements and thus their effective management is crucial. We then discuss current solutions to to fine-grained access control in the context of relational database systems and identify relevant issues.

Added 2008-05-28

Optimal k-Anonymity with Flexible Generalization Schemes through Bottom-up Searching

Tiancheng Li, Ninghui Li

In recent years, a major thread of research on kanonymity has focused on developing more flexible generalization schemes that produce higher-quality datasets. In this paper we introduce three new generalization schemes that improve on existing schemes, as well as algorithms enumerating valid generalizations in these schemes. We also introduce a taxonomy for generalization schemes and a new cost metric for measuring information loss. We present a bottom-up search strategy for finding optimal anonymizations. This strategy works particularly well when the value of k is small. We show the feasibility of our approach through experiments on real census data.

Added 2008-05-28

Penumbra: an Eclipse plugin for introductory programming

Frank Mueller, Antony L. Hosking

Eclipse is a full-featured and easily extensible integrated development environment. As such, it has grown to include a large degree of functionality that may be overwhelming to the novice programmer. Nevertheless, we believe Eclipse is an environment that students of programming will come to find useful and empowering once they become familiar with it. The trick is easing them into using Eclipse without them feeling overwhelmed at the outset. Penumbra is an Eclipse plug-in developed at Purdue University for use in our introductory programming classes. It is intended to ease the transition to use of the full-featured functionality of Eclipse. Penumbra presents an Eclipse perspective that hides all but the basic actions of Eclipse’s existing Java perspective, while packaging elements of other perspectives (e.g., the CVS perspective) into simpler actions that ease the downloading and turn-in of programming assignments, and adding new code views inspired by other environments for introductory programmers. Our experiences using Eclipse with a small group of introductory programming students in the Spring of 2003 have guided the development of Penumbra, which is now being rolled out for general use by the full class of 230 students in the Fall of 2003.

Added 2008-05-27

Safe futures for Java

Adam Welc, Suresh Jagannathan, Antony Hosking

A future is a simple and elegant abstraction that allows concurrency to be expressed often through a relatively small rewrite of a sequential program. In the absence of side-effects, futures serve as benign annotations that mark potentially concurrent regions of code. Unfortunately, when computation relies heavily on mutation as is the case in Java, its meaning is less clear, and much of its intended simplicity lost.This paper explores the definition and implementation of safe futures for Java. One can think of safe futures as truly transparent annotations on method calls, which designate opportunities for concurrency. Serial programs can be made concurrent simply by replacing standard method calls with future invocations. Most significantly, even though some parts of the program are executed concurrently and may indeed operate on shared data, the semblance of serial execution is nonetheless preserved. Thus, program reasoning is simplified since data dependencies present in a sequential program are not violated in a version augmented with safe futures.Besides presenting a programming model and API for safe futures, we formalize the safety conditions that must be satisfied to ensure equivalence between a sequential Java program and its future-annotated counterpart. A detailed implementation study is also provided. Our implementation exploits techniques such as object versioning and task revocation to guarantee necessary safety conditions. We also present an extensive experimental evaluation of our implementation to quantify overheads and limitations. Our experiments indicate that for programs with modest mutation rates on shared data, applications can use futures to profitably exploit parallelism, without sacrificing safety.

Added 2008-05-27

Nested transactional memory: Model and architecture sketches

J. Eliot, B. Moss, Antony L. Hosking

We offer a reference model for nested transactions at the level of memory accesses, and sketch possible hardware architecture designs that implement that model. We describe both closed and open nesting. The model is abstract in that it does not relate to hardware, such as caches, but describes memory as seen by each transaction, memory access conflicts, and the effects of commits and aborts. The hardware sketches describe approaches to implementing the model using bounded size caches in a processor with overflows to memory. In addition to a model that will support concurrency within a transaction, we describe a simpler model that we call linear nesting. Linear nesting supports only a single thread of execution in a transaction nest, but may be easier to implement. While we hope that the model is a good target to which to compile transactions from source languages, the mapping from source constructs to nested transactional memory is beyond the scope of the paper.

Added 2008-05-27

Open nesting in software transactional memory

Yang Ni, Vijay S. Menon, Ali-Reza Adl-Tabatabai, Antony L. Hosking, Richard L. Hudson

Transactional memory (TM) promises to simplify concurrent programming while providing scalability competitive to fine-grained locking. Language-based constructs allow programmers to denote atomic regions declaratively and to rely on the underlying system to provide transactional guarantees along with concurrency. In contrast with fine-grained locking, TM allows programmers to write simpler programs that are composable and deadlock-free.

TM implementations operate by tracking loads and stores to memory and by detecting concurrent conflicting accesses by different transactions. By automating this process, they greatly reduce the programmer’s burden, but they also are forced to be conservative. Incertain cases, conflicting memory accesses may not actually violate the higher-level semantics of a program, and a programmer may wish to allow seemingly conflicting transactions to execute concurrently.

Open nested transactions enable expert programmers to differentiate between physical conflicts, at the level of memory, and logical conflicts that actually violate application semantics. A TMsystem with open nesting can permit physical conflicts that are not logical conflicts, and thus increase concurrency among application threads.

Here we present an implementation of open nested transactions in a Java-based software transactional memory (STM)system. We describe new language constructs to support open nesting in Java, and we discuss new abstract locking mechanisms that a programmer can use to prevent logical conflicts. We demonstrate how these constructs can be mapped efficiently to existing STM data structures. Finally, we evaluate our system on a set of Java applications and data structures, demonstrating how open nesting can enhance application scalability.

Added 2008-05-27

The DaCapo benchmarks: java benchmarking development and analysis

Blackburn, Garner, Hoffmann, Khang, McKinley, Bentzur, Diwan, Feinberg, Frampton, Guyer, Hosking

Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs due to dynamic compilation and garbage collection required for Java programs, many evaluations still use methodologies developed for C, C++, and Fortran. SPEC, the dominant purveyor of benchmarks, compounded this problem by institutionalizing these methodologies for their Java benchmark suite. This paper recommends benchmarking selection and evaluation methodologies, and introduces the DaCapo benchmarks, a set of open source, client-side Java benchmarks. We demonstrate that the complex interactions of (1) architecture, (2) compiler, (3) virtual machine, (4) memory management, and (5) application require more extensive evaluation than C, C++, and Fortran which stress (4) much less, and do not require (3). We use and introduce new value, time-series, and statistical metrics for static and dynamic properties such as code complexity, code size, heap composition, and pointer mutations. No benchmark suite is definitive, but these metrics show that DaCapo improves over SPEC Java in a variety of ways, including more complex code, richer object behaviors, and more demanding memory system requirements. This paper takes a step towards improving methodologies for choosing and evaluating benchmarks to foster innovation in system design and implementation for Java and other managed languages.

Added 2008-05-27

Optimizing the read and write barriers for orthogonal persistence

A.L. Hosking, N. Nystrom, Q. Cutts, K. Brahnmath
Added 2008-05-27

Preemption-based avoidance of priority inversion for Java

A. Welc, A.L. Hosking, Suresh Jagannathan

Priority inversion occurs in concurrent programs when low-priority threads hold shared resources needed by some high-priority thread, causing them to block indefinitely. Shared resources are usually guarded by low-level synchronization primitives such as mutual-exclusion locks, semaphores, or monitors. There are two existing solutions to priority inversion. The first, establishing high-level scheduling invariants over synchronization primitives to eliminate priority inversion a priori, is difficult in practice and undecidable in general. Alternatively, run-time avoidance mechanisms such as priority inheritance still force high-priority threads to wait until desired resources are released. We describe a novel compiler and run-time solution to the problem of priority inversion, along with experimental evaluation of its effectiveness. Our approach allows preemption of any thread holding a resource needed by higher-priority threads, forcing it to release its claim on the resource, roll back its execution to the point at which the shared resource was first acquired, and discard any updates made in the interim. The compiler inserts code at synchronization points, permitting rollback of thread execution, and efficient revocation of interim updates. Our design and implementation are realized in the context of IBM’s Jikes RVM, a high-quality compiler and runtime system for Java. Our performance results show that throughput of high-priority threads using our scheme can be improved by 30% to 100% when compared with a classical scheduler that does not address priority inversion.

Added 2008-05-27

PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets

Mehmet Koyuturk, Ananth Grama

This paper presents an efficient framework for error-bounded compression of high-dimensional discrete attributed datasets. Such datasets, which frequently arise in a wide variety of applications, pose some of the most significant challenges in data analysis. Subsampling and compression are two key technologies for analyzing these datasets. PROXIMUS provides a technique for reducing large datasets into a much smaller set of representative patterns, on which traditional (expensive) analysis algorithms can be applied with minimal loss of accuracy. We show desirable properties of PROXIMUS in terms of runtime, scalability to large datasets, and performance in terms of capability to represent data in a compact form. We also demonstrate applications of PROXIMUS in association rule mining. In doing so, we establish PROXIMUS as a tool for preprocessing data before applying computationally expensive algorithms or as a tool for directly extracting correlated patterns. Our experimental results show that use of the compressed data for association rule mining provides excellent precision and recall values (near 100%) across a range of support thresholds while reducing the time required for association rule mining drastically.

Added 2008-05-23

Sieve: A Tool for Automatically Detecting Variations Across Program Versions

Murali Krishna Ramanathan, Ananth Grama, Suresh Jagannathan

Software systems often undergo many revisions during their lifetime as new features are added, bugs repaired, abstractions simplified and refactored, and performance improved. When a revision, even a minor one, does occur, the changes it induces must be tested to ensure that invariants assumed in the original version are not violated unintentionally. In order to avoid testing components that are unchanged across revisions, impact analysis is often used to identify code blocks or functions that are affected by a change. In this paper, we present a novel solution to this general problem that uses dynamic programming on instrumented traces of different program binaries to identify longest common subsequences in strings generated by these traces. Our formulation allows us to perform impact analysis and also to detect the smallest set of locations within the functions where the effect of the changes actually manifests itself. Sieve is a tool that incorporates these ideas. Sieve is unobtrusive, requiring no programmer or compiler intervention to guide its behavior. Our experiments on multiple versions of open-source C programs shows that Sieve is an effective and scalable tool to identify impact sets and can locate regions in the affected functions where the changes manifest. These results lead us to conclude that Sieve can play a beneficial role in program testing and software maintenance.

Added 2008-05-23

Bounded-error compression of particle data from hierarchical approximate methods

Dow-Yung Yang, Ananth Grama, Vivek Sarin
Added 2008-05-23

Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis

Mehmet Koyuturk, Ananth Grama, Naren Ramakrishnan

This article presents the design and implementation of a software tool, PROXIMUS, for error-bounded approximation of high-dimensional binary attributed datasets based on nonorthogonal decomposition of binary matrices. This tool can be used for analyzing data arising in a variety of domains ranging from commercial to scientific applications. Using a combination of innovative algorithms, novel data structures, and efficient implementation, PROXIMUS demonstrates excellent accuracy, performance, and scalability to large datasets. We experimentally demonstrate these on diverse applications in association rule mining and DNA microarray analysis. In limited beta release, PROXIMUS currently has over 300 installations in over 10 countries.

Added 2008-05-20

Redundancy and coverage detection in sensor networks

Bogdan Carbunar, Ananth Grama, Jan Vitek, Octavian Carbunar

We study the problem of detecting and eliminating redundancy in a sensor network with a view to improving energy efficiency, while preserving the network’s coverage. We also examine the impact of redundancy elimination on the related problem of coverage-boundary detection. We reduce both problems to the computation of Voronoi diagrams, prove and achieve lower bounds on the solution of these problems, and present efficient distributed algorithms for computing and maintaining solutions in cases of sensor failures or insertion of new sensors. We prove the correctness and termination properties of our distributed algorithms, and analytically characterize the time complexity and traffic generated by our algorithms. Using detailed simulations, we also quantify the impact of system parameters such as sensor density, transmission range, and failure rates on network traffic.

Added 2008-05-20

Hierarchical petascale simulation framework for stress corrosion cracking

P. Vashishta, R.K. Kalia, A. Nakano, E. Kaxiras, A. Grama, G. Lu, S. Eidenbenz, A.F. Voter, R.Q. Hood, J.A. Moriarty, L.H. Yang

The goal of this SciDAC project is to develop a scalable parallel and distributed computational framework consisting of methods, algorithms, and integrated software tools for: 1) multi Tera-to-Petascale simulations with quantum-level accuracy; 2) multimillion-to-multibillion-to-trillion atom molecular dynamics (MD) simulations based on density functional theory (DFT) and temperature dependent model generalized pseudopotential theory; 3) quasicontinuum (QC) method embedded with classical atomistic and quantum simulations based on DFT; and 4) accelerated molecular dynamics (AMD) coupled with hierarchical atomistic/QC simulations to reach macroscopic length and time scales relevant to SCC. Scalability is being achieved beyond 105 processors through linear-scaling algorithms and performance-optimization techniques. We are employing automated model transitioning to embed higher fidelity simulations concurrently inside coarser simulations only when and where they are required. Each scale and model has well-defined error bounds with controlled error propagation across the scales and models to estimate uncertainty in predictions.

Added 2008-05-19