Reports and Papers Archive - Reports & Papers

Factoring integers

S. S. Wagstaff, Jr.

A summary of the best methods for factoring integers know at the time of publication.

Added 2003-07-18

Proving Ownership Over Categorical Data

CERIAS TR 2003-19

Radu Sion

Download: PDF

This paper introduces a novel method of rights protection for categorical data through watermarking. We discover new watermark embedding channels for relational data with categorical types. We design novel watermark encoding algorithms and analyze important theoretical bounds including mark vulnerability. While fully preserving data quality requirements, our solution survives important attacks, such as subset selection and data re-sorting. Mark detection is fully “blind” in that it doesn’t require the original data, an important characteristic especially in the case of massive data. We propose various improvements and alternative encoding methods. We perform validation experiments by watermarking the outsourced Wal-Mart sales data available at our institute. We prove (experimentally and by analysis) our solution to be extremely resilient to both alteration and data loss attacks, for example tolerating up to 80% data loss with a watermark alteration of only 25%.

Added 2003-07-15

Parallel Algorithms for Maximum Matching in Complements of Interval Graphs and Related Problems

CERIAS TR 2003-16

M.G. Andrews, M.J. Atallah, D.Z. Chen, D.T. Lee

Download: PDF

Given a set of n intervals representing an interval graph, the problem of finding a maximum matching between pairs of disjoint (nonintersecting) intervals has been considered in the sequential model. In this paper we present parallel algorithms for computing maximum cardinality matchings among pairs of disjoint intervals in interval graphs in the EREW PRAM and hypercube models. For the general case of the problem, our algorithms compute a maximum matching in O(log3n) time using O(n/log2n) processors on the EREW PRAM and using n processor on the hypercubes. For the case of proper interval graphs, our algorithm runs in O(log n) time using O(n) processors if the input intervals are not given already sorted and using O(n/log n) processors otherwise, on the EREW PRAM. On n-processor hypercubes, our algorithm for the proper interval case takes O(log n log log n) time for unsorted input and O(log n) time for sorted input. Our parallel results also lead to optimal sequential algorithms for computing maximum matchings among disjoint intervals. In addition, we present an improved parallel algorithm for maximum matching between overlapping intervals in proper interval graphs.

Added 2003-07-02

On the Discovery of Weak Periodicities in Large Time Series

CERIAS TR 2003-17

Christos Berberidis, Ioannis Vlahavas, Walid G. Aref, Mikhail Atallah, Ahmed Elmagarmid

Download: PDF

The search for weak periodic signals in time series data is an active topic of research. Given the fact that rarely a real world dataset is perfectly periodic, this paper approaches this problem in terms of data mining, trying to discover approximate and partial periodicities, when no period length is known in advance. We utilize the autocorrelation function in order to extract partial periodicities from large time series. In existing time series mining algorithms, the period length is user-specified. We propose an algorithm that extracts a set of candidate periods featured in a time series that satisfy a minimum confidence threshold, by utilizing the autocorrelation function and FFT as a filter. We extend this technique for capturing approximate periodicities. We provide some mathematical background as well as experimental results.

Added 2003-07-02

Adaptive Data Structures for IP Lookups

CERIAS TR 2003-18

Ioannis Ioannidis, Ananth Grama, Mikhail Atallah

Download: PDF

The problem of efficient data structures for IP lookups has been well studied in literature. Techniques such as LC tries and Extensible Hashing are commonly used. In this paper, we address the problem of generalizing LC tries and Extensible Hashing, based on traces of past lookups, to provide performance guarantees for memory sub-optimal structures. As a specific example, if a memory-optimal (LC) trie takes 6MB and the total memory at the router is 8MB, how should the trie be modified to make best use of the 2 MB of excess memory? We present a greedy algorithm for this problem and prove that, if for the optimal data structure there are b fewer memory accesses on average for each lookup compared with the original trie, the solution produced by the greedy algorithm will have (9 x b)/22 fewer memory accesses on average (compared to the original trie). An efficient implementation of this algorithm presents significant additional challenges. We describe an implementation with a time complexity of O(ξ(d)n x log n) and a space complexity of O(n), where n is the number of nodes of the trie and d its depth. The depth of a trie is fixed for a given version of the Internet protocol and is typically O(log n). In this case, ξ(d) = O(log2n). We demonstrate experimentally the performance and scalability of the algorithm on actual routing data. We also show that our algorithm significantly outperforms Extensible Hashing for the same amount of memory.

Added 2003-07-02

Authentication of LZ-77 Compressed Data

CERIAS TR 2003-14

Mikhail Atallah, Stefano Lonardi

Download: PDF

The formidable dissemination capability allowed by the current network technology makes it increasingly important to devise new methods to ensure authenticity. Nowadays it is common practice to distribute documents in compressed form. In this paper, we propose a simple variation on the classic LZ-77 algorithm that allows one to hide, within the compressed document, enough information to warrant its authenticity. The design is based on the unpredictability of a certain class of pseudo-random generators, in such a way that the hidden data cannot be retrieved in a reasonable amount of time by an attacker (unless the secret bit-string key is known). Since it can still be decompressed by the original LZ-77 algorithm, the embedding is completely

Added 2003-07-01

Efficient Parallel Algorithms for Planar st-Graphs

CERIAS TR 2003-15

Mikhail Atallah, Danny Z. Chen, Ovidiu Daescu

Download: PDF

Planar st-graphs find applications in a number of areas. In this paper we present efficient parallel algorithms for solving fundamental problems on planar st-graphs. The problems we consider include all-pairs shortest paths in weighted planar st-graphs, single-source shortest paths in weighted planar layered digraphs (which can be reduced to single-source shortest paths in certain special planar st-graphs), and depth-first search in planar st-graphs. Our parallel shortest path techniques exploit the specific geometric and graphic structures of planar st-graphs, and involve schemes for partitioning planar st-graphs into sub-graphs in a way that ensures that the resulting path length matrices have a monotonicity property [1], [2]. The parallel algorithms we obtain are a considerable improvement over the previously best known solutions (when they are applied to these st-graph problems), and are in fact relatively simple. The parallel computational models we use are the CREW PRAM and EREW PRAM.

Added 2003-07-01

(Almost) Optimal Parallel Block Access for Range Queries

CERIAS TR 2003-08

Mikhail Atallah, Sunil K. Prabhakar

Download: PDF

Range queries are an important class of queries for several applications. can be achieved by tiling the multi-dimensional data and distributing it among multiple disks or nodes. It has been established that schemes that achieve optimal parallel block access exist only for a few special cases. Though several schemes for the allocation of tiles to disks have been developed, no scheme with non-trivial worst-case bound is known. We establish that any range query on a 2^q x 2^q-block grid of blocks can be performed using k = 2^t disks (t≤q), in at most [m/k] + O(log k) parallel block accesses. We achieve this result by judiciously distributing the blocks among the k nodes or disks. Experimental data show that the algorithm achieves very close to [m/k] performance (on average less than 0.5 away from [m/k], with a worst-case of 3). Although several declustering schemes for range queries have been developed, prior to our work no additive non-trivial performance bounds were known. Our scheme guarantees performance within a (small) additive deviation from [m/k]. Subsequent to this work, Bhatia et al. [4] have proved that such a performance bound is essentially optimal for this kind of scheme, and have also extended our results to the case where the number of disks is a product of the form k1*k2…*kt where kis need to all be 2.

Added 2003-06-30

Rights Protection for Relational Data

CERIAS TR 2003-11

Radu Sion, Mikhail Atallah, Sunil Prabhakar

Download: PDF

Protecting rights over relational data is of ever increasing interest, especially considering areas where sensitive, valuable content is to be outsourced. A good example is a data mining application, where data is sold in pieces to parties specialized in mining it. Different avenues for rights protection are available, each with its own advantages and drawbacks. Enforcement by legal means is usually ineffective in preventing theft of copyrighted works, unless augmented by a digital counter-part, for example watermarking. Recent research of the authors introduces the issue of digital watermarking for generic number sets. In the present paper we expand on this foundation and introduce a solution for relational database content rights protection through watermarking. Our solution adresses important attacks, such as data re-sorting, subset selection, linear data changes (applying a linear transformation on arbitrary subsets of the data). Our watermark also survives up to 50% and above data losses. Finally we present wmbd.*, a proof-of-concept implementation of out algorithm and its application to real life data, namely in watermarking the outsourced Wal-Mart sales data that we have available at our institute.

Added 2003-06-30

Optimal Parallel I/O for Range Queries through Replication

CERIAS TR 2003-10

Keith Frikken, Mikhail Atallah, Sunil Prabhakar, Rei Safavi-Naini

Download: PDF

In this paper we study the problem of declustering two-dimensional datasets with replication over parallel devices to improve range query performance. The related problem of declustering without replication has been well studied. It has been established that strictly optimal declustering schemes do no exist if data is not replicated. In addition to the usual problem of identifying a good allocation, the replicated version of the problem needs to address the issue of identifying a good retrieval schedule for a given query. We address both problems in this paper. An efficient algorithm for finding a lowest cost retrieval schedule is developed. This algorithm works for any query, not just range queries. Two replicated placement schemes are presented - one that results in a strictly optimal allocation, and another that guarantees a retrieval cost that is either optimal or 1 more than the optimal for any range query.

Added 2003-06-30

Privacy-Preserving Cooperative Statistical Analysis

CERIAS TR 2003-12

Wenliang Du, Mikhail Atallah

Download: PDF

The growth of the Internet opens up tremendous opportunities for cooperative computation, where the answer depends on the private inputs of separate entities. Sometimes these computations may occur between mutually untrusted entities. The problem is trivial if the context allows the conduct of these computations by a trusted entity that would know the inputs from all the participants; however if the context disallows this then the techniques of secure multi-party computation become very relevant and can provide useful solutions. Statistic analysis is a widely used computation in real life, but the known methods usually require one to know the whole data set; little work has been conducted to investigate how statistical analysis could be performed in a cooperative environment, where the participants want to conduct statistical analysis on the joint data set, but each participant is concerned about the confidentiality of its own data. In this paper we have developed protocols for conducting the statistic analysis in such kind of cooperative environment based on a data perturbation technique and cryptography primitives.

Added 2003-06-30

Secure Supply-Chain Protocols

CERIAS TR 2003-13

Mikhail Atallah, Hicham Elmongui, Vinayak Deshpande, Leroy Schwarz

Download: PDF

Supply chain interactions have huge economic importance, yet theses interactions are managed inefficiently. One of the major sources of inefficiency in supply-chain management is information asymmetry; i.e., information that is available to one or more organizations in the chain (e.g., manufacturer retailer) is not available to others. There are several causes of information asymmetry, among them fear that a powerful buyer or supplier will take advantage of private information, that information will leak to a competitor, etc. We propose Secure Supply-Chain Collaboration (SSCC) protocols that enable supply-chain partners to cooperatively achieve desired system-wide goals without revealing the private information of any of the parties. Secure supply-chain collaboration has the potential to improve supply-chain management practice, and, by removing one major inefficiency therein, improve productivity. We present specific SSCC protocols for two types of supply-chain interactions: Capacity allocation, and e-auctions. In the course of doing so, we design techniques that are of independent interest, and are likely to be useful in the design of future SSCC protocols.

Added 2003-06-30

Attacking Digital Watermarks

CERIAS TR 2003-07

Radu Sion and Mikhail Atallah

Download: PDF

This paper discusses inherent vulnerabilities of digital watermarking that affect its mainstream purpose of rights protection. We ask: how resistant is watermarking to un-informed attacks ?

There are a multitude of application scenarios for watermarking and, with the advent of modern content distribution networks and the associated rights assessment issues, it has recently become a topic of increasing interest. But how well is watermarking suited for this main purpose of rights protection ? Existing watermarking techniques are vulnerable to attacks threatening their overall viability. Most of these attacks have the final goal of removing the watermarking information while preserving the actual value of the watermarked Work.

In this paper we identify an inherent trade-off between two important properties of watermarking algorithms: being “convincing enough” in court} while at the same time surviving a set of attacks, for a broad class of watermarking algorithms. We show that there exist inherent limitations in protecting rights over digital Works. In the attempt to become as convincing as possible (e.g. in a court of law, low rate of false positives), watermarking applications become more fragile to attacks aimed at removing the watermark while preserving the value of the Work. They are thus necessarily characterized by a significant (e.g. in some cases 35%+) non-zero probability of being successfully attacked without any knowledge about their algorithmic details. We quantify this vulnerability for a class of algorithms and show how a minimizing “sweet spot” can be found. We then derive a set of recommendations for watermarking algorithm design.

Added 2003-06-25

Provable Access Accounting for Content Distribution Networks

CERIAS TR 2003-06

Radu Sion and Mikhail Atallah

Download: PDF

One common revenue model in Content Distribution Networks (CDN), requires the CDN operator to provide access statistics (e.g. hits, transferred bytes) to the content provider, who in turn is expected to deliver payment dependent on these reported values. An implicit assumption of self-regulated truthfulness of the CDN operator governs this process. The content provider has to trust that the content distributor provides accurate numbers and does not artificially “inflate” them. This type of one-sided accounting is not tolerated well in two-party business interactions. An independent accuracy proof is preferred.

Here we present a provable secure verification mechanism for access accounting in this framework. Our solution exploits one of the common enabling mechanisms of CDN location awareness, dynamic DNS. We discuss several variations and analyze associated attacks.

Added 2003-06-25

Watermarking with quadratic residues

CERIAS TR 2003-09

M.J. Atallah and S.S. Wagstaff, Jr.

Download: PDF

We describe novel methods of watermarking data using quadratic residues and random numbers. Our methods are fast generic and improve the security of the watermark in most known watermarking techniques.

Added 2003-06-22