Towards A Differentially Private Data Anonymization

Get BibTex-formatted data

Download

PDF

Author

Mohamed R. Fouad, Khaled Elbassioni, Elisa Bertino

Tech report number

CERIAS TR 2012-1

Entry type

techreport

Abstract

Maximizing data usage and minimizing privacy risk are two conflicting goals. Organizations always hide the owners' identities and then apply a set of transformations on their data before releasing it. While determining the best set of transformations has been the focus of extensive work in database community, most of this work suffered from one or two of the following major problems: scalability and privacy guarantee. To the best of our knowledge, none of the proposed scalable anonymization techniques provides privacy guarantees supported with well-formulated theoretical foundation. Differential privacy provides a theoretical formulation for privacy that ensures that the system behaves essentially the same way, regardless of whether any individual, or small group of individuals, are included in the database. In this paper, we address both scalability and privacy risk of data anonymization. We propose a scalable algorithm that meets differential privacy when applying a specific random sampling. The contribution of the paper is three-fold: (1) We prove that determining the optimal transformations is an NP-hard problem and propose a heuristic approximation based on genetic algorithms, (2) we propose a personalized anonymization technique based on Lagrangian formulation and prove that it could be solved in polynomial time, and (3) we prove that a variant of the proposed Lagrangian technique with specific sampling satisfies differential privacy. Through experimental studies we compare our proposed algorithm with other anonymization schemes in terms of both time and privacy risk. We show that the proposed algorithm is scalable. Moreover, we compare the performance of the proposed approximate algorithm with the optimal algorithm and show that the sacrifice in risk is outweighed by the gain in e±ciency.

Download

PDF

Date

2012 – 1 – 26

Key alpha

Fouad

Publication Date

2012-01-26

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.

@Techreport{ Fouad,
	title = "Towards A Differentially Private Data Anonymization",
	author = "Mohamed R. Fouad, Khaled Elbassioni, Elisa Bertino",
	year = "2012",
	month = "1",
	day = "26",
	abstract = "Maximizing data usage and minimizing privacy risk are two
conflicting goals. Organizations always hide the owners'
identities and then apply a set of transformations on their
data before releasing it. While determining the best set
of transformations has been the focus of extensive work in
database community, most of this work suffered from one
or two of the following major problems: scalability and privacy guarantee. To the best of our knowledge, none of the proposed scalable anonymization techniques provides privacy guarantees supported with well-formulated theoretical foundation. Differential privacy provides a theoretical formulation for privacy that ensures that the system behaves essentially the same way, regardless of whether any individual, or small group of individuals, are included in the
database.
In this paper, we address both scalability and privacy risk
of data anonymization. We propose a scalable algorithm
that meets differential privacy when applying a specific random sampling. The contribution of the paper is three-fold:
(1) We prove that determining the optimal transformations
is an NP-hard problem and propose a heuristic approximation based on genetic algorithms, (2) we propose a personalized anonymization technique based on Lagrangian formulation and prove that it could be solved in polynomial time, and (3) we prove that a variant of the proposed Lagrangian technique with specific sampling satisfies differential privacy.
Through experimental studies we compare our proposed
algorithm with other anonymization schemes in terms of
both time and privacy risk. We show that the proposed algorithm is scalable. Moreover, we compare the performance
of the proposed approximate algorithm with the optimal algorithm and show that the sacrifice in risk is outweighed by
the gain in e±ciency.",
}