Abstract
Data mining can extract important knowledge from large data collections, but sometimes these collections are split among various parties. Data warehousing, bringing data from multiple sources under a single authority, increases risk of privacy violations. Furthermore, privacy concerns may prevent the parties from directly sharing even some meta-data.
Distributed data mining and processing provide a means to address this issue, particularly if queries are processed in a way that avoids the disclosure of any information beyond the final result. This thesis presents methods to mine horizontally partitioned data without violating privacy and shows how to use the data mining results in a privacy-preserving way. The methods incorporate cryptographic techniques
to minimize the information shared, while adding as little as possible overhead to the mining and processing task.
Contents
LIST OF TABLES
LIST OF FIGURES
ABBREVIATIONS
ABSTRACT
1 Introduction
2 Privacy-preserving Data Mining: State-of-the-art and Related Issues
3 General Secure Multi-party Computation and Cryptographic Tools
4 Privacy-preserving Distributed Association Rule Mining
5 Privacy-preserving Distributed k-Nearest Neighbor classification
6 Privacy-preserving Distributed Naive Bayes Classifier
7 When do Data Mining Results Violate Privacy?
8 Using Decision Rules for Private Classification
9 Summary
LIST OF REFERENCES
VITA