Abstract
In this information age, data and knowledge extracted by data mining techniques represent a key asset driving research, innovation, and policy-making activities. Many agencies and organizations have recognized the need of accelerating such trends and are therefore willing to release the data they collected to other parties, for purposes such as research and the formulation of public policies. However the data publication processes are today still very difficult. Data often contains personally identifiable information and therefore releasing such data may result in privacy breaches; this is the case for the examples of microdata, e.g., census data and medical data.
This thesis studies how we can publish and share microdata in a privacy-preserving manner. We present an extensive study of this problem along three dimensions: (1) designing a simple, intuitive, and robust privacy model; (2) designing an effective anonymization technique that works on sparse and high-dimensional data; and (3) developing a methodology for evaluating privacy and utility tradeoff.