A lot of people are intimidated by auctions. They fear being recognized for bids they never intended to make; they are confused by the array of items; they are unclear about terms like choice or by the piece, two times the money; 1 and they have difficulty making their minds up fast enough to get in on a sale before the bidding stops. They are afraid of getting stuck with an inferior item at a ridiculously high price. Ultimately, however, they put their trust in the person in charge: the auctioneer.
The auctioneer controls the sometimes frenzied proceedings orally. Cassady (1967, 165) notes, “The auctioneer’s appearance, voice, rhythm of patter, good nature, imperturbability, and storytelling ability may have an effect on bidding activity, thus enhancing prices.” Often amplified by a microphone, the auctioneer’s voice rises above the din of the crowd and assistants to maintain order and demand the attention of prospective bidders. If, for instance, a person is erroneously recognized for a bid, the auctioneer has the discretion and power to make things right. And, naturally, the auctioneer wants to do so, because auctioneers want people to feel comfortable and safe at auctions. Whatever the situation, the auctioneer’s voice organizes and controls the proceedings.
An audible voice is missing, however, from a new kind of auction that has appeared in the past six years. This auction still sells items to the highest bidder, it still can be confusing, and it still is a jumble of items and action and unfamiliar terms, but with one notable absence: there is no auctioneer. This auction is the on-line auction, and instead of a single person orally controlling the auction, there is only a Web site. Instead of a bidder able to observe competing bidders in the crowd, there are only the “buyer” and “seller” usernames. And yet these on-line auctions, led by industry behemoth eBay, have thriven. How is this so? Across cultures and times, auctions [End Page 286] have taken place under the supervision of an auctioneer whose voice commands attention and maintains order. On-line auctions have to create hypertext messages that somehow compensate for the missing orality. This essay argues that eBay and other auction Web sites are actually not that different from live English auctions of livestock, antiques, fish, tobacco, broadcast licenses, household goods, or myriad other items (Cassady 1967). On the contrary, in a virtual space eBay has maintained order and interest by mimicking the auctioneer’s oral style and the rules of in-person English auctions.
k-anonymity provides a measure of privacy protection by preventing re-identification of data to fewer than a group of k data items. While algorithms exist for producing k-anonymous data, the model has been that of a single source wanting to publish data. This paper presents a k-anonymity protocol when the data is vertically partitioned between sites. A key contribution is a proof that the protocol preserves k-anonymity between the sites: While one site may have individually identifiable data, it learns nothing that violates k-anonymity with respect to the data at the other site. This is a fundamentally different distributed privacy definition than that of Secure Multiparty Computation, and it provides a better match with both ethical and legal views of privacy.
n this paper we discuss the need for real-time data mining for many applications in government and industry and describe resulting research issues. We also discuss dependability issues including incorporating security, integrity, timeliness and fault tolerance into data mining. Several different data mining outcomes are described with regard to their implementation in a real-time environment. These outcomes include clustering, association-rule mining, link analysis and anomaly detection. The paper describes how they would be used together in various parallel-processing architectures. Stream mining is discussed with respect to the challenges of performing data mining on stream data from sensors. The paper concludes with a summary and discussion of directions in this emerging area.
ransportation and logistics are a major sector of the economy, however data analysis in this domain has remained largely in the province of optimization. The potential of data mining and knowledge discovery techniques is largely untapped. Transportation networks are naturally represented as graphs. This paper explores the problems in mining of transportation network graphs: we hope to find how current techniques both succeed and fail on this problem, and from the failures, we hope to present new challenges for data mining. Experimental results from applying both existing graph mining and conventional data mining techniques to real transportation network data are provided, including new approaches to making these techniques applicable to the problems. Reasons why these techniques are not appropriate are discussed. We also suggest several challenging problems to precipitate research and galvanize future work in this area.
The ability of databases to organize and share data often raises privacy concerns. Data warehousing combined with data mining, bringing data from multiple sources under a single authority, increases the risk of privacy violations. Privacy preserving data mining provides a means of addressing this issue, particularly if data mining is done in a way that doesn’t disclose information beyond the result. This paper presents a method for privately computing k–nn classification from distributed sources without revealing any information about the sources or their data, other than that revealed by the final classification result.
The growth in interchange of business and other sensitive data has led to increasing interest in access control. While broad-based access control may be adequate for library-style document bases, new applications demand different access rights on different documents, or different parts of a document. Methods have been developed that enforce fine-grained access control in XML, but the administrative complexity of hard-coding rules is still a challenge. We present an XQuery-based approach for deriving access control rules from schemalevel rules, document or database content, or rules on other documents. This approach provides a novel capability to exploit non-structural information in broadly-applicable rules, making it feasible to specify data- and context-dependent rules for large document sets.
Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. The key is to obtain valid results, while providing guarantees on the (non)disclosure of data. We present a method for k-means clustering when different sites contain different attributes for a common set of entities. Each site learns the cluster of each entity, but learns nothing about the attributes at other sites.
Homeland security measures are increasing the amount of data collected, processed and mined. At the same time, owners of the data raised legitimate concern about their privacy and potential abuses of the data. Privacy-preserving data mining techniques enable learning models without violating privacy. This paper addresses a complementary problem: What if we want to apply a model without revealing it? This paper presents a method to apply classification rules without revealing either the data or the rules. In addition, the rules can be verified not to use “forbidden” criteria.
This work aims to provide administrators with services for managing permissions in a distributed object system, by connecting business-level tasks to access controls on low level functions. Specifically, the techniques connect abilities (to complete externally- invoked functions) to the access controls on individual functions, across all servers. Our main results are the problem formalization, plus algorithms to synthesize “least privilege†permissions for a given set of desired abilities. Desirable extensions and numerous research issues are identified.
his paper provides directions for Web and e-commerce application security. In particular, access control policies, workflow security, XML security and federated database security issues pertaining to the Web and e-commerce applications are discussed.
Whereas much of the previous work on data mining has focused on mining data in relational databases, we discuss mining objects. Object models are very popular for representing multimedia data, and therefore we need to mine object databases to extract useful information from the large quantities of multimedia data. We first describe the motivation for multimedia data mining with examples and then discuss object mining with focus on text, image, video and audio mining. We also address the need for real time data mining for multimedia applications.
One aspect of constructing secure networks is identifying unauthorized use of those networks. Intrusion detection systems look for unusual or suspicious activity, such as patterns of network traffic that are likely indicators of unauthorized activity. However, normal operation often produces traffic that matches likely “attack signaturesâ€, resulting in false alarms. We are using data mining techniques to identify sequences of alarms that likely result from normal behavior, enabling construction of filters to eliminate those alarms. This can be done at a low cost for specific environments, enabling the construction of customized intrusion detection filters. We present our approach, and preliminary results identifying common sequences in alarms from a particular environment.
TopCat (Topic Categories) is a technique for identifying topics that recur in articles in a text corpus. Natural language processing techniques are used to identify key entities in individual articles, allowing us to represent an article as a set of items. This allows us to view the problem in a database/data mining context: Identifying related groups of items. This paper presents a novel method for identifying related items based on “traditional†data mining techniques. Frequent itemsets are generated from the groups of items, followed by clusters formed with a hypergraph partitioning scheme. We present an evaluation against an anually-categorized “ground truth†news corpus showing this technique is effective in identifying “topics†in collections of news articles.
Data mining technology is giving us the ability to extract meaningful patterns from large quantities of structured data. Information retrieval systems have made large quantities of textual data available. Extracting meaningful patterns from this data is difficult. Current tools for mining structured data are inappropriate for free text. We outline problems involved in Knowledge Discovery in Text, and present an architecture for extracting patterns that hold across multiple documents. The capabilities that such a system could provide are illustrated.
Association-rule mining has proved a highly successful technique for extracting useful information from very large databases. This success is attributed not only to the appropriateness of the objectives, but to the fact that a number of new query-optimization ideas, such as the “a-priori†trick, make association-rule mining run much faster than might be expected. In this paper we see that the same tricks can be extended to a much more general context, allowing efficient mining of very large databases for many different kinds of patterns. The general idea, called “query flocks,†is a generate-and-test model for data-mining problems. We show how the idea can be used either in a general-purpose mining system or in a next generation of conventional query optimizers.