Public Population Information in Differential Privacy
Project Members
Christine Task, Prof. Chris Clifton
Christine Task, Prof. Chris Clifton
Abstract
Privatized queries which satisfy the strict requirements of differential privacy include randomized noise calibrated to cover up the impact any arbitrary individual could have on the query results. An attacker viewing these results will only be able to learn aggregate information about the data-set, as the noise prevents any single individual from having a detectable effect on the results. Not all queries can be effectively privatized in this fashion: if an arbitrary individual could potentially have a very large effect on the results of a query, then the requisite noise may be so large as to eliminate the utility of the results. However, adding sufficient noise to obscure a truly 'arbitrary' individual is unnecessary if the data-set is sampled from a limited population publicly known to satisfy certain constraints. This is a simple but powerful insight; many queries which are too sensitive to be privatizable in general can be successfully privatized in constrained populations. We prove that reducing noise based on public population information does not weaken privacy, and describe a selection of sensitive queries which become usefully privatizable within constrained populations.