The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Automating the approximate record-matching process

Author

Vassillios S. Verykios, Ahmed K. Elmagarmid, Elias N. Houstis

Entry type

article

Abstract

Data quality has many dimensions one of which is accuracy. Accuracy is usually compromised by errors accidentally or intensionally introduced in a database system. These errors result in inconsistent, incomplete, or erroneous data elements. For example, a small variation in the representation of a data object, produces a unique instantiation of the object being represented. In order to improve the accuracy of the data stored in a database system, we need to compare them either with real-world counterparts or with other data stored in the same or a different system. In this paper, we address the problem of matching records which refer to the same entity by computing their similarity. Exact record matching has limited applicability in this context since even simple errors like character transpositions cannot be captured in the record-linking process. Our methodology deploys advanced data-mining techniques for dealing with the high computational and inferential complexity of approximate record matching.

Date

2000 – 07

Journal

Information Sciences

Key alpha

Elmagarmid

Pages

83-98

Publisher

Elsevier Science

Volume

126

Affiliation

Purdue University

Publication Date

2000-07-00

Copyright

2000

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.