The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Provably Private Data Anonymization: Or, k-Anonymity Meets Differential Privacy

Download

Download PDF Document
PDF

Author

Ninghui Li, Wahbeh Qardaji, Dong Su

Tech report number

CERIAS TR 2010-24

Entry type

techreport

Abstract

Privacy-preserving microdata publishing currently lacks a solid theoretical foundation. Most existing techniques are developed to satisfy syntactic privacy notions such as $k$-anonymity, which fails to provide strong privacy guarantees. The recently proposed notion of differential privacy has been widely accepted as a sound privacy foundation for statistical query answering. However, no general practical microdata publishing techniques are known to satisfy differential privacy. In this paper, we start to bridge this gap. We first analyze k-anonymization methods and show how they fail to provide sufficient protection against re-identification, which it was designed to protect. We then prove that, $k$-anonymization methods, when done "safely", and when preceded with a random sampling step, can satisfy $(\epsilon,\delta)$-differential privacy with reasonable parameters. This result is, to our knowledge, the first to link $k$-anonymity with differential privacy and illustrates that "hiding in a crowd of $k$" indeed offers privacy guarantees. This naturally leads to future research in designing "safe" and practical $k$-anonymization methods. We observe that our result gives an alternative approach to output perturbation for satisfying differential privacy: namely, adding a random sampling step in the beginning and pruning results that are too sensitive to changing a single tuple. This approach may be applicable to settings other than microdata publishing. We also show that adding a random-sampling step can greatly amplify the level of privacy guarantee provided by a differentially-private algorithm. This result makes it much easier to provide strong privacy guarantees when one wishes to publish a portion of the raw data. Finally, we show that current definitions of $(\epsilon, \delta)$-differential privacy require $\delta$ to be very small to provide sufficient privacy protection when publishing microdata, making the notion impractical in some scenarios. To address this problem, we introduce a notion called $f$-smooth $(\epsilon, \delta)$-differential privacy.

Download

PDF

Date

2010 – 12 – 15

Key alpha

Li

Affiliation

Purdue University

Publication Date

2010-12-15

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.