Reports and Papers Archive - Reports & Papers

Virtual Orality: How eBay Controls Auctions without an Auctioneer's Voice

CERIAS TR 2001-99

Josh Boyd

Download: PDF

A lot of people are intimidated by auctions. They fear being recognized for bids they never intended to make; they are confused by the array of items; they are unclear about terms like choice or by the piece, two times the money; 1 and they have difficulty making their minds up fast enough to get in on a sale before the bidding stops. They are afraid of getting stuck with an inferior item at a ridiculously high price. Ultimately, however, they put their trust in the person in charge: the auctioneer.

The auctioneer controls the sometimes frenzied proceedings orally. Cassady (1967, 165) notes, “The auctioneer’s appearance, voice, rhythm of patter, good nature, imperturbability, and storytelling ability may have an effect on bidding activity, thus enhancing prices.” Often amplified by a microphone, the auctioneer’s voice rises above the din of the crowd and assistants to maintain order and demand the attention of prospective bidders. If, for instance, a person is erroneously recognized for a bid, the auctioneer has the discretion and power to make things right. And, naturally, the auctioneer wants to do so, because auctioneers want people to feel comfortable and safe at auctions. Whatever the situation, the auctioneer’s voice organizes and controls the proceedings.

An audible voice is missing, however, from a new kind of auction that has appeared in the past six years. This auction still sells items to the highest bidder, it still can be confusing, and it still is a jumble of items and action and unfamiliar terms, but with one notable absence: there is no auctioneer. This auction is the on-line auction, and instead of a single person orally controlling the auction, there is only a Web site. Instead of a bidder able to observe competing bidders in the crowd, there are only the “buyer” and “seller” usernames. And yet these on-line auctions, led by industry behemoth eBay, have thriven. How is this so? Across cultures and times, auctions [End Page 286] have taken place under the supervision of an auctioneer whose voice commands attention and maintains order. On-line auctions have to create hypertext messages that somehow compensate for the missing orality. This essay argues that eBay and other auction Web sites are actually not that different from live English auctions of livestock, antiques, fish, tobacco, broadcast licenses, household goods, or myriad other items (Cassady 1967). On the contrary, in a virtual space eBay has maintained order and interest by mimicking the auctioneer’s oral style and the rules of in-person English auctions.

Added 2008-02-06

Privacy-Preserving Distributed k-Anonymity

CERIAS TR 2005-134

Christopher Clifton

Download: PDF

k-anonymity provides a measure of privacy protection by preventing re-identification of data to fewer than a group of k data items. While algorithms exist for producing k-anonymous data, the model has been that of a single source wanting to publish data. This paper presents a k-anonymity protocol when the data is vertically partitioned between sites. A key contribution is a proof that the protocol preserves k-anonymity between the sites: While one site may have individually identifiable data, it learns nothing that violates k-anonymity with respect to the data at the other site. This is a fundamentally different distributed privacy definition than that of Secure Multiparty Computation, and it provides a better match with both ethical and legal views of privacy.

Added 2008-02-04

Dependable real-time data mining

CERIAS TR 2005-133

Christopher Clifton

Download: PDF

n this paper we discuss the need for real-time data mining for many applications in government and industry and describe resulting research issues. We also discuss dependability issues including incorporating security, integrity, timeliness and fault tolerance into data mining. Several different data mining outcomes are described with regard to their implementation in a real-time environment. These outcomes include clustering, association-rule mining, link analysis and anomaly detection. The paper describes how they would be used together in various parallel-processing architectures. Stream mining is discussed with respect to the challenges of performing data mining on stream data from sensors. The paper concludes with a summary and discussion of directions in this emerging area.

Added 2008-02-04

Knowledge discovery from transportation network data

CERIAS TR 2005-132

Christopher Clifton

Download: PDF

ransportation and logistics are a major sector of the economy, however data analysis in this domain has remained largely in the province of optimization. The potential of data mining and knowledge discovery techniques is largely untapped. Transportation networks are naturally represented as graphs. This paper explores the problems in mining of transportation network graphs: we hope to find how current techniques both succeed and fail on this problem, and from the failures, we hope to present new challenges for data mining. Experimental results from applying both existing graph mining and conventional data mining techniques to real transportation network data are provided, including new approaches to making these techniques applicable to the problems. Reasons why these techniques are not appropriate are discussed. We also suggest several challenging problems to precipitate research and galvanize future work in this area.

Added 2008-02-04

Privately Computing a Distributed k-nn Classifier

CERIAS TR 2004-92

Christopher Clifton

Download: PDF

The ability of databases to organize and share data often raises privacy concerns. Data warehousing combined with data mining, bringing data from multiple sources under a single authority, increases the risk of privacy violations. Privacy preserving data mining provides a means of addressing this issue, particularly if data mining is done in a way that doesn’t disclose information beyond the result. This paper presents a method for privately computing kâ€“nn classification from distributed sources without revealing any information about the sources or their data, other than that revealed by the final classification result.

Added 2008-02-04

Derived access control specification for XML

CERIAS TR 2003-48

Christopher Clifton

Download: PDF

The growth in interchange of business and other sensitive data has led to increasing interest in access control. While broad-based access control may be adequate for library-style document bases, new applications demand different access rights on different documents, or different parts of a document. Methods have been developed that enforce fine-grained access control in XML, but the administrative complexity of hard-coding rules is still a challenge. We present an XQuery-based approach for deriving access control rules from schemalevel rules, document or database content, or rules on other documents. This approach provides a novel capability to exploit non-structural information in broadly-applicable rules, making it feasible to specify data- and context-dependent rules for large document sets.

Added 2008-02-04

Privacy-preserving k-means clustering over vertically partitioned data

CERIAS TR 2003-47

Christopher Clifton

Download: PDF

Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. The key is to obtain valid results, while providing guarantees on the (non)disclosure of data. We present a method for k-means clustering when different sites contain different attributes for a common set of entities. Each site learns the cluster of each entity, but learns nothing about the attributes at other sites.

Added 2008-02-04

Assuring privacy when big brother is watching

CERIAS TR 2003-46

Christopher Clifton

Download: PDF

Homeland security measures are increasing the amount of data collected, processed and mined. At the same time, owners of the data raised legitimate concern about their privacy and potential abuses of the data. Privacy-preserving data mining techniques enable learning models without violating privacy. This paper addresses a complementary problem: What if we want to apply a model without revealing it? This paper presents a method to apply classification rules without revealing either the data or the rules. In addition, the rules can be verified not to use “forbidden” criteria.

Added 2008-02-04

Coordinating Accessibility versus Restrictions in Distributed Object Systems

CERIAS TR 2001-95

Christopher Clifton

Download: PDF

This work aims to provide administrators with services for managing permissions in a distributed object system, by connecting business-level tasks to access controls on low level functions. Specifically, the techniques connect abilities (to complete externally- invoked functions) to the access controls on individual functions, across all servers. Our main results are the problem formalization, plus algorithms to synthesize â€œleast privilegeâ€ permissions for a given set of desired abilities. Desirable extensions and numerous research issues are identified.

Added 2008-02-04

Directions for Web and e-commerce applications security

CERIAS TR 2001-94

Christopher Clifton

Download: PDF

his paper provides directions for Web and e-commerce application security. In particular, access control policies, workflow security, XML security and federated database security issues pertaining to the Web and e-commerce applications are discussed.

Added 2008-02-04

Real-time data mining of multimedia objects

CERIAS TR 2001-93

Christopher Clifton

Download: PDF

Whereas much of the previous work on data mining has focused on mining data in relational databases, we discuss mining objects. Object models are very popular for representing multimedia data, and therefore we need to mine object databases to extract useful information from the large quantities of multimedia data. We first describe the motivation for multimedia data mining with examples and then discuss object mining with focus on text, image, video and audio mining. We also address the need for real time data mining for multimedia applications.

Added 2008-02-04

Developing custom intrusion detection filters using data mining

CERIAS TR 2001-92

Christopher Clifton

Download: PDF

One aspect of constructing secure networks is identifying unauthorized use of those networks. Intrusion detection systems look for unusual or suspicious activity, such as patterns of network traffic that are likely indicators of unauthorized activity. However, normal operation often produces traffic that matches likely â€œattack signaturesâ€, resulting in false alarms. We are using data mining techniques to identify sequences of alarms that likely result from normal behavior, enabling construction of filters to eliminate those alarms. This can be done at a low cost for specific environments, enabling the construction of customized intrusion detection filters. We present our approach, and preliminary results identifying common sequences in alarms from a particular environment.

Added 2008-02-04

TopCat: Data Mining for Topic Identification in a Text Corpus

CERIAS TR 2001-91

Christopher Clifton

Download: PDF

TopCat (Topic Categories) is a technique for identifying topics that recur in articles in a text corpus. Natural language processing techniques are used to identify key entities in individual articles, allowing us to represent an article as a set of items. This allows us to view the problem in a database/data mining context: Identifying related groups of items. This paper presents a novel method for identifying related items based on â€œtraditionalâ€ data mining techniques. Frequent itemsets are generated from the groups of items, followed by clusters formed with a hypergraph partitioning scheme. We present an evaluation against an anually-categorized â€œground truthâ€ news corpus showing this technique is effective in identifying â€œtopicsâ€ in collections of news articles.

Added 2008-02-04

Data mining on text

CERIAS TR 2001-90

Christopher Clifton

Download: PDF

Data mining technology is giving us the ability to extract meaningful patterns from large quantities of structured data. Information retrieval systems have made large quantities of textual data available. Extracting meaningful patterns from this data is difficult. Current tools for mining structured data are inappropriate for free text. We outline problems involved in Knowledge Discovery in Text, and present an architecture for extracting patterns that hold across multiple documents. The capabilities that such a system could provide are illustrated.

Added 2008-02-04

Query flocks: a generalization of association-rule mining

CERIAS TR 2001-89

Christopher Clifton

Download: PDF

Association-rule mining has proved a highly successful technique for extracting useful information from very large databases. This success is attributed not only to the appropriateness of the objectives, but to the fact that a number of new query-optimization ideas, such as the â€œa-prioriâ€ trick, make association-rule mining run much faster than might be expected. In this paper we see that the same tricks can be extended to a much more general context, allowing efficient mining of very large databases for many different kinds of patterns. The general idea, called â€œquery flocks,â€ is a generate-and-test model for data-mining problems. We show how the idea can be used either in a general-purpose mining system or in a next generation of conventional query optimizers.

Added 2008-02-04