The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Reports and Papers Archive


Browse All Papers »       Submit A Paper »

ViWiD: Visible Watermarking-Based Defense Against Phishing

CERIAS TR 2005-130
Atallah
Download: PDF
Added 2008-02-01

Privacy-preserving distributed mining of association rules on horizontally partitioned data

CERIAS TR 2004-91
Christopher Clifton
Download: PDF

Data mining can extract important knowledge from large data collections ut sometimes these collections are split among various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. We address secure mining of association rules over horizontally partitioned data. The methods incorporate cryptographic techniques to minimize the information shared, while adding little overhead to the mining task.

Added 2008-01-31

TopCat: data mining for topic identification in a text corpus

CERIAS TR 2004-90
Christopher Clifton
Download: PDF

TopCat (topic categories) is a technique for identifying topics that recur in articles in a text corpus. Natural language processing techniques are used to identify key entities in individual articles, allowing us to represent an article as a set of items. This allows us to view the problem in a database/data mining context: Identifying related groups of items. We present a novel method for identifying related items based on traditional data mining techniques. Frequent itemsets are generated from the groups of items, followed by clusters formed with a hypergraph partitioning scheme. We present an evaluation against a manually categorized ground truth news corpus; it shows this technique is effective in identifying topics in collections of news articles.

Added 2008-01-31

Change Detection in Overhead Imagery Using Neural Networks

CERIAS TR 2003-45
Christopher Clifton
Download: PDF

Identifying interesting changes from a sequence of overhead imagery—as opposed to clutter, lighting/seasonal changes, etc.—has been a problem for some time. Recent advances in data mining have greatly increased the size of datasets that can be attacked with pattern discovery methods. This paper presents a technique for using predictive modeling to identify unusual changes in images. Neural networks are trained to predict “before” and “after” pixel values for a sequence of images. These networks are then used to predict expected values for the same images used in training. Substantial differences between the expected and actual values represent an unusual change. Results are presented on both multispectral and panchromatic imagery.

Added 2008-01-31

Emerging standards for data mining

CERIAS TR 2001-80
Christopher Clifton
Download: PDF

This paper presents an overview of data mining, then discusses standards (both existing and proposed) that are relevant to data mining. This includes standards that affect several stages of a data mining project. Summaries of several emerging standards are given, as well as proposals that have the potential to change the way data mining tools are built.

Added 2008-01-31

Using sample size to limit exposure to data mining

CERIAS TR 2001-79
Christopher Clifton
Download: PDF

Data mining introduces new problems in database security. The basic problem of using non-sensitive data to infer sensitive data is made more difficult by the “probabilistic” inferences possible with data mining. This paper shows how lower bounds from pattern recognition theory can be used to determine sample sizes where data mining tools cannot obtain reliable results.

Added 2008-01-31

SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks

CERIAS TR 2001-78
Christopher Clifton
Download: PDF

One step in interoperating among heterogeneous databases is semantic integration: Identifying relationships between attributes or classes in different database schemas. SEMantic INTegrator (SEMINT) is a tool based on neural networks to assist in identifying attribute correspondences in heterogeneous databases. SEMINT supports access to a variety of database systems and utilizes both schema information and data contents to produce rules for matching corresponding attributes automatically. This paper provides theoretical background and implementation details of SEMINT. Experimental results from large and complex real databases are presented. We discuss the effectiveness of SEMINT and our experiences with attribute correspondence identification in various environments.

Added 2008-01-31

Database Integration Using Neural Networks: Implementation and Experiences

CERIAS TR 2001-77
Christopher Clifton
Download: PDF

Applications in a wide variety of industries require access to multiple heterogeneous distributed databases. One step in heterogeneous database integration is semantic integration: identifying corresponding attributes in different databases that represent the same real world concept. The rules of semantic integration can not be ‘pre-programmed’ since the information to be accessed is heterogeneous and attribute correspondences could be fuzzy. Manually comparing all possible pairs of attributes is an unreasonably large task. We have applied artificial neural networks (ANNs) to this problem. Metadata describing attributes is automatically extracted from a database to represent their ‘signatures’. The metadata is used to train neural networks to find similar patterns of metadata describing corresponding attributes from other databases. In our system, the rules to determine corresponding attributes are discovered through machine learning. This paper describes how we applied neural network techniques in a database integration problem and how we represent an attribute with its metadata as discriminators. This paper focuses on our experiments on effectiveness of neural networks and each discriminator. We also discuss difficulties of using neural networks for this problem and our wish list for the Machine Learning community.

Added 2008-01-31

HyperFile: A Data and Query Model for Documents

CERIAS TR 2001-76
Christopher Clifton
Download: PDF

Non-quantitative information such as documents and pictures pose interesting new problems in the database world. Traditional data models and query languages do not provide appropriate support for this information. Such data are typically stored in file systems, which do not provide the security, integrity, or query features of database management systems. The hypertext model has emerged as a good interface to this information; however finding information using hypertext browsing does not scale well. We develop a query interface that serves as an extension of the browsing model of hypertext systems. These queries minimize the repeated user interactions required to locate data in a standard hypertext system. HyperFile is a prototype data server interface. In this article, we describe HyperFile, including a number of issues such as query generation, query processing, and indexing.

Added 2008-01-31

Identifying Rare Classes with Sparse Training Data

CERIAS TR 2007-97
Christopher Clifton
Download: PDF

Building models and learning patterns from a collection of data are essential tasks for decision making and dissemination of knowledge. One of the common tools to extract knowledge is to build a classifier. However, when the training dataset is sparse, it is difficult to build an accurate classifier. This is especially true in biological science, as biological data are hard to produce and error-prone. Through empirical results, this paper shows challenges in building an accurate classifier with a sparse biological training dataset. Our findings indicate the inadequacies in well known classification techniques. Although certain clustering techniques, such as seeded k-Means, show some promise, there are still spaces for further improvement. In addition, we propose a novel idea that could be used to produce more balanced classifier when training data samples are very limited.

Added 2008-01-31

Private Combinatorial Group Testing

CERIAS TR 2008-3
Mikhail J. Atallah
Download: PDF

Combinatorial group testing, given a set C of individuals (“customers”), consists of applying group tests on subsets of C for the purpose of identifying which members of C are infected (or, more generally, defective in some way). The outcome of a group test reveals only the presence or absence of infection(s) in that group, but a number of group tests exactly identifies all infected members.

Added 2008-01-30

Information Privacy in Organizations: Empowering Creative and Extra-role Performance

CERIAS TR 2006-59
Bradley Alge
Download: PDF

This article examines the relationship of employee perceptions of information privacy in their work organizations and important psychological and behavioral outcomes. A model is presented in which information privacy predicts psychological empowerment, which in turn predicts discretionary behaviors on the job, including creative performance and organizational citizenship behavior. Results from two studies (Study 1 single organization, N = 310; Study 2 multiple organizations, N = 303) confirm that information privacy entails judgments of information gathering control, information handling control, and legitimacy. Moreover, a model linking information privacy to empowerment, and empowerment to creative performance and OCBs was supported. Findings are discussed in light of organizational attempts to control employees through the gathering and handling of their personal information.

Added 2008-01-29

Remote Control: Predictors of Electronic Monitoring Intensity and Secrecy

CERIAS TR 2004-89
Bradley Alge
Download: PDF

Electronic monitoring research has focused predominantly on the reactions of monitored employees and less attention has been paid to the processes that trigger managers’ decisions to electronically monitor subordinates. Employing a distributed virtual team simulation, this study examined the effects of dependence, future performance expectations, and propensity to trust on team leaders’ decisions to electronically monitor their subordinates. Results indicate that team leaders electronically monitor subordinates more intensely when dependence on subordinates is high or future performance expectations are low. Moreover, team leaders are more likely to monitor in secret when dependence is high or propensity to trust is low. Although team leaders increased their level of electronic monitoring over time, this tendency was stronger when the leader had consistently low performance expectations. Reprinted by permission of the publisher.

Added 2008-01-29

When Does the Medium Matter? Knowledge-Building Experiences and Opportunities in Decision Teams.

CERIAS TR 2003-44
Bradley Alge
Download: PDF

The purpose of this investigation was to examine whether temporal scope—the extent to which teams have a past or expect to have a future together—affects face-to-face and computer-mediated teams’ ability to communicate effectively and make high quality decisions. Results indicated that media differences existed for teams lacking a history, with face-to-face teams exhibiting higher openness/trust and information sharing than computer-mediated teams. However, computer-mediated teams with a history were able to eliminate these differences. These findings did not extend to team-member exchange (TMX). Although face-to-face teams exhibited higher TMX compared to computer-mediated teams, the interaction of temporal scope and communication media was not significant. In addition, openness/trust and TMX were positively associated with decision-making effectiveness when task interdependence was high, but were unrelated to decision-making effectiveness when task interdependence was low.

Added 2008-01-29

Measuring Customer Service Orientation Using a Measure of Interpersonal Skills

CERIAS TR 2002-50
Bradley Alge
Download: PDF

Organizations are placing increased emphasis on identifying individuals with customer service orientation. In the present investigation we test whether interpersonal skills, as measured through Holland and Baird’‘s (1968) Interpersonal Competence Scale, provides a narrow, yet valid, measure of customer service orientation. Data were collected from a sample of bus transit operators. Interpersonal skills was positively related to operator self-reported performance, but was not related to supervisor ratings or objective measures of performance. Implications for the study and use of broad versus narrowly defined personality constructs in organizational settings are discussed.

Added 2008-01-29