The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Reports and Papers Archive


Browse All Papers »       Submit A Paper »

Assuring privacy when big brother is watching

CERIAS TR 2003-46
Christopher Clifton
Download: PDF

Homeland security measures are increasing the amount of data collected, processed and mined. At the same time, owners of the data raised legitimate concern about their privacy and potential abuses of the data. Privacy-preserving data mining techniques enable learning models without violating privacy. This paper addresses a complementary problem: What if we want to apply a model without revealing it? This paper presents a method to apply classification rules without revealing either the data or the rules. In addition, the rules can be verified not to use “forbidden” criteria.

Added 2008-02-04

Coordinating Accessibility versus Restrictions in Distributed Object Systems

CERIAS TR 2001-95
Christopher Clifton
Download: PDF

This work aims to provide administrators with services for managing permissions in a distributed object system, by connecting business-level tasks to access controls on low level functions. Specifically, the techniques connect abilities (to complete externally- invoked functions) to the access controls on individual functions, across all servers. Our main results are the problem formalization, plus algorithms to synthesize “least privilege” permissions for a given set of desired abilities. Desirable extensions and numerous research issues are identified.

Added 2008-02-04

Directions for Web and e-commerce applications security

CERIAS TR 2001-94
Christopher Clifton
Download: PDF

his paper provides directions for Web and e-commerce application security. In particular, access control policies, workflow security, XML security and federated database security issues pertaining to the Web and e-commerce applications are discussed.

Added 2008-02-04

Real-time data mining of multimedia objects

CERIAS TR 2001-93
Christopher Clifton
Download: PDF

Whereas much of the previous work on data mining has focused on mining data in relational databases, we discuss mining objects. Object models are very popular for representing multimedia data, and therefore we need to mine object databases to extract useful information from the large quantities of multimedia data. We first describe the motivation for multimedia data mining with examples and then discuss object mining with focus on text, image, video and audio mining. We also address the need for real time data mining for multimedia applications.

Added 2008-02-04

Developing custom intrusion detection filters using data mining

CERIAS TR 2001-92
Christopher Clifton
Download: PDF

One aspect of constructing secure networks is identifying unauthorized use of those networks. Intrusion detection systems look for unusual or suspicious activity, such as patterns of network traffic that are likely indicators of unauthorized activity. However, normal operation often produces traffic that matches likely “attack signatures”, resulting in false alarms. We are using data mining techniques to identify sequences of alarms that likely result from normal behavior, enabling construction of filters to eliminate those alarms. This can be done at a low cost for specific environments, enabling the construction of customized intrusion detection filters. We present our approach, and preliminary results identifying common sequences in alarms from a particular environment.

Added 2008-02-04

TopCat: Data Mining for Topic Identification in a Text Corpus

CERIAS TR 2001-91
Christopher Clifton
Download: PDF

TopCat (Topic Categories) is a technique for identifying topics that recur in articles in a text corpus. Natural language processing techniques are used to identify key entities in individual articles, allowing us to represent an article as a set of items. This allows us to view the problem in a database/data mining context: Identifying related groups of items. This paper presents a novel method for identifying related items based on “traditional” data mining techniques. Frequent itemsets are generated from the groups of items, followed by clusters formed with a hypergraph partitioning scheme. We present an evaluation against an anually-categorized “ground truth” news corpus showing this technique is effective in identifying “topics” in collections of news articles.

Added 2008-02-04

Data mining on text

CERIAS TR 2001-90
Christopher Clifton
Download: PDF

Data mining technology is giving us the ability to extract meaningful patterns from large quantities of structured data. Information retrieval systems have made large quantities of textual data available. Extracting meaningful patterns from this data is difficult. Current tools for mining structured data are inappropriate for free text. We outline problems involved in Knowledge Discovery in Text, and present an architecture for extracting patterns that hold across multiple documents. The capabilities that such a system could provide are illustrated.

Added 2008-02-04

Query flocks: a generalization of association-rule mining

CERIAS TR 2001-89
Christopher Clifton
Download: PDF

Association-rule mining has proved a highly successful technique for extracting useful information from very large databases. This success is attributed not only to the appropriateness of the objectives, but to the fact that a number of new query-optimization ideas, such as the “a-priori” trick, make association-rule mining run much faster than might be expected. In this paper we see that the same tricks can be extended to a much more general context, allowing efficient mining of very large databases for many different kinds of patterns. The general idea, called “query flocks,” is a generate-and-test model for data-mining problems. We show how the idea can be used either in a general-purpose mining system or in a next generation of conventional query optimizers.

Added 2008-02-04

Dynamic Integration and Query Processing with Ranked Role Sets

CERIAS TR 2001-152
Christopher Clifton

The role-set approach is a new conceptual framework for data integration in multidatabase systems that maintains the materialization autonomy of local database systems and provides users with more accurate information. The role-set approach presents the answer to a query as a set of relations where the distinct intersections between the relations corresponding to the various roles played by an entity. In this paper we show how the basic role-based approach can be extended in the absence of information about the multidatabase keys (global IDs). We propose a strategy based on ranked role-sets that makes use of a semantic integration procedure based on neural networks to determine candidate global IDs. The data integration and query processing steps then produce a number of role-sets, ranked by the similarity of the candidate IDs.

Added 2008-02-04

Security and Privacy Implications of Data Mining

CERIAS TR 2001-88
Christopher Clifton
Download: PDF

Data mining enables us to discover information we do not expect to find in databases. This can be a security/privacy issue: If we make information available, are we perhaps giving out more than we bargained for? This position paper discusses possible problems and solutions, and outlines ideas for further research in this area.

Added 2008-02-04

Classifying software components using design characteristics

CERIAS TR 2001-87
Christopher Clifton
Download: PDF

Classifying software modules in a component library is a major problem in software reuse. Indexing criteria must adequately reflect the semantics of the components. This must be done without undue effort in either classifying the software, or developing “queries” to find candidates for reuse. We present an architecture for automatically classifying and querying software based on design information. We present a method for determining if indexing criteria are effective, and show results using a set of criteria automatically extracted from an existing collection of programs

Added 2008-02-04

Semantic Integration in Heterogeneous Databases using neural networks

CERIAS TR 2001-86
Christopher Clifton
Download: PDF

One important step in integrating heterogeneous databases is matching equivalent attributes: Determining which fields in two databases refer to the same data. The meaning of information may be embodied within a. database model, a conceptual schema, application programs, or data contents. Integration involves extracting semantics, expressing them as metadata, and matching semantically equivalent data elements. We present a procedure using a classifier to categorize attributes according to their field specifications and data values, then train a neural network to recognize similar attributes. In our technique, the knowledge of how to match equivalent data elements is “discovered” from metadata , not “pre-programmed”.

Added 2008-02-04

Using Field Specifications to Determine Attribute Equivalence in Heterogeneous Databases

CERIAS TR 2001-85
Christopher Clifton
Download: PDF

One step in integrating heterogeneous database systems is matching equivalent attributes: determining which fields in the two databases refer to the same data. The authors see three (complementary) techniques to automate this process: synonym dictionaries that compare field names, design criteria that compare field specifications, and comparison of data values. They present a technique for using field specifications to compare attributes, and evaluate this technique on a variety of databases.

Added 2008-02-04

The Gold Mailer

CERIAS TR 2001-84
Christopher Clifton
Download: PDF

The Gold Mailer, a system that provides users with an integrated way to send and receive messages using different media, efficiently store and retrieve these messages, and access a variety of sources of other useful information, is described. The mailer solves the problems of information overload, organization of messages and multiple interfaces. By providing good storage and retrieval facilities, it can be used as a powerful information processing engine covering a range of useful office information. The Gold Mailer’s query language, indexing engine, file organization, data structures, and support of mail message data and multimedia documents are discussed.

Added 2008-02-04

Distributed processing of filtering queries in HyperFile

CERIAS TR 2001-82
Christopher Clifton
Download: PDF

A language has been developed for queries which serves as an extension of the browsing model of hypertext systems. The query language and data model fit naturally into a distributed environment. A simple and efficient method is discussed for processing distributed queries in this language. Results of experiments run on a distributed data server using this algorithm are presented

Added 2008-02-04