Algebraic Techniques for Analysis of Large Discrete-Valued Datasets

Get BibTex-formatted data

Author

Mehmet Koyuturk, Ananth Grama, Naren Ramakrishnan

Entry type

article

Abstract

With the availability of large scale computing platforms and instrumentation for data gathering, increased emphasis is being placed on efficient techniques for analyzing large and extremely high-dimensional datasets. In this paper, we present a novel algebraic technique based on a variant of semi-discrete matrix decomposition (SDD), which is capable of compressing large discrete-valued datasets in an error bounded fashion. We show that this process of compression can be thought of as identifying dominant patterns in underlying data. We derive efficient algorithms for computing dominant patterns, quantify their performance analytically as well as experimentally, and identify applications of these algorithms in problems ranging from clustering to vector quantization.We demonstrate the superior characteristics of our algorithm in terms of (i) scalability to extremely high dimensions; (ii) bounded error; and (iii) hierarchical nature, which enables multiresolution analysis. Detailed experimental results are provided to support these claims.

Date

2002

URL

http://www.springerlink.com/content/cgmr45mabgn9xck8/

Booktitle

Principles of Data Mining and Knowledge Discovery

Key alpha

Grama

Pages

345-360

Publisher

Springer Berlin / Heidelberg

Series

Lecture Notes in Computer Science

Volume

2431

Affiliation

Purdue University

Publication Date

2002-00-00

Copyright

2002

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.

@Article{ Grama,
	title = " Algebraic Techniques for Analysis of Large Discrete-Valued Datasets",
	author = "Mehmet Koyuturk, Ananth Grama, Naren Ramakrishnan",
	year = "2002",
	booktitle = "Principles of Data Mining and Knowledge Discovery",
	pages = "345-360",
	publisher = "Springer Berlin / Heidelberg",
	series = "Lecture Notes in Computer Science",
	volume = "2431",
	abstract = "With the availability of large scale computing platforms and instrumentation for data gathering, increased emphasis is being placed on efficient techniques for analyzing large and extremely high-dimensional datasets. In this paper, we present a novel algebraic technique based on a variant of semi-discrete matrix decomposition (SDD), which is capable of compressing large discrete-valued datasets in an error bounded fashion. We show that this process of compression can be thought of as identifying dominant patterns in underlying data. We derive efficient algorithms for computing dominant patterns, quantify their performance analytically as well as experimentally, and identify applications of these algorithms in problems ranging from clustering to vector quantization.We demonstrate the superior characteristics of our algorithm in terms of (i) scalability to extremely high dimensions; (ii) bounded error; and (iii) hierarchical nature, which enables multiresolution analysis. Detailed experimental results are provided to support these claims.",
	affiliation = "Purdue University",
	copyright = "2002",
	url = "http://www.springerlink.com/content/cgmr45mabgn9xck8/",
}