The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets

Author

Mehmet Koyuturk, Ananth Grama

Entry type

article

Abstract

This paper presents an efficient framework for error-bounded compression of high-dimensional discrete attributed datasets. Such datasets, which frequently arise in a wide variety of applications, pose some of the most significant challenges in data analysis. Subsampling and compression are two key technologies for analyzing these datasets. PROXIMUS provides a technique for reducing large datasets into a much smaller set of representative patterns, on which traditional (expensive) analysis algorithms can be applied with minimal loss of accuracy. We show desirable properties of PROXIMUS in terms of runtime, scalability to large datasets, and performance in terms of capability to represent data in a compact form. We also demonstrate applications of PROXIMUS in association rule mining. In doing so, we establish PROXIMUS as a tool for preprocessing data before applying computationally expensive algorithms or as a tool for directly extracting correlated patterns. Our experimental results show that use of the compressed data for association rule mining provides excellent precision and recall values (near 100%) across a range of support thresholds while reducing the time required for association rule mining drastically.

Date

2003 – 1 – 1

Key alpha

Grama

Publication Date

2003-01-01

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.