Principal Investigator: Jordan Awan
Overview
As more personal data is collected, analyzed, and published by tech companies, academic researchers, and government agencies, the concern for privacy protection increases. To address these concerns, formal privacy protection methods, such as differential privacy (DP), are becoming widely employed by tech companies as well as federal statistical agencies. To protect privacy, DP methods require the introduction of additional randomness into the analyses, which “covers up” what any individual has contributed to the database. However, this extra noise results in additional bias and variance. Due to this, researchers and policy makers who depend on these products have found that traditional statistical tools give misleading and unreliable results. For example, the 2020 US Decennial Census employed differential privacy and social scientists, demographers, and economists have raised alarms over the difficulty in obtaining valid inferences.
This raises the question: how can we enable researchers to obtain valid statistical inference on privatized data for a variety of models and privacy mechanisms? While previous work has begun developing statistical tools in DP, most of the prior solutions are tailored to narrow problems, and often lack concrete statistical guarantees. We propose to develop general-purpose statistical inference methods, which are applicable to a wide variety of analyses on privatized data, and which are based on the leading methods of likelihood-free inference. We will also develop new privacy-protecting procedures that will be optimized to enable reliable statistical inferences, including unbiased estimators, valid confidence intervals, calibrated hypothesis tests, and posterior inference.
Intellectual Merit
The proposed interdisciplinary research will deliver both theoretical and practical tools for the advancement of statistical approaches in complex settings such as those entailed by the added noise of DP mechanisms. The likelihood-free methods that will be studied and implemented from the proposed research will provide computationally efficient solutions to solve complex problems in both the private and in the non-private data settings. These will be among the first general tools to perform adequate statistical inference for DP outputs that are not tailored to very specific statistical tests with narrow applications, and can be broadly applied to many statistical procedures that are commonly used in the social sciences and other fields of research. In addition, studying these approaches will open future avenues of research to efficiently obtain statistical outputs with adequate finite sample and asymptotic properties (unbiasedness, consistency, efficiency) in settings for which solutions do not currently exist, such as reliable inference procedures for the common cases of missing, censored or truncated data. More specifically, the study of how certain statistics can deliver better results within a simulation-based framework can greatly contribute to the development of techniques such as co-sufficient sampling and indirect inference whose theoretical and practical advantages have not been fully exploited.
Other PIs: Robert Molinari, Assistant Professor, Department of Statistics, Auburn University
Other Faculty: Nianqiao Ju, Assistant Professor, Department of Statistics, Purdue University Vinayak Rao, Associate Professor, Department of Statistics, Purdue University Ruobin Gong, Assistant Professor, Department of Statistics, Rutgers University Andres Felipe Barrientos, Assistant Professor of Statistics, Florida State University
Students: Zhanyu Wang (now graduated) Yu Wei Chen Xinlong Du Aidan Davis Samuel Forfang
Keywords: confidence interval, Differential Privacy, hypothesis test, simulation-based inference