Abstract
Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. The key is to obtain valid results, while providing guarantees on the (non)disclosure of data. We present a method for k-means clustering when different sites contain different attributes for a common set of entities. Each site learns the cluster of each entity, but learns nothing about the attributes at other sites.
Note
The Ninth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining
August 24-27, 2003 in Washington, D.C.
Honorable Mention, Best Paper Competition