2016 Symposium Posters

Posters > 2016

Data Classification Using Anatomized Training Data


PDF

Primary Investigator:
Chris Clifton

Project Members
Koray Mancuhan, Chris Clifton
Abstract
In this poster, we introduce the anatomized learning problem: data classification using the anatomized training data. Anatomized training data has all data values preserved but uncertainty in the mapping between identifying and sensitive values. We first present two approaches of data classification: the Decision Tree and the Nearest Neighbor. We then present the empirical results on the Adult dataset which is commonly used in the privacy community. The theoretical results are also provided for the nearest neighbor approach. The experimental and the theoretical results show that the proposed approaches come near the limits of learning through the unprotected data although requiring larger training sets. Finally, we outlay a brief summary of ongoing work which is supported by the Northrop Grumman Cybersecurity Research Consortium (NGCRC).