Back to Search Start Over

A privacy-sensitive approach to distributed clustering

Authors :
Merugu, Srujana
Ghosh, Joydeep
Source :
Pattern Recognition Letters. Mar2005, Vol. 26 Issue 4, p399-410. 12p.
Publication Year :
2005

Abstract

Abstract: While data mining algorithms are often designed to operate on centralized data, in practice data is often acquired and stored in a distributed manner. Centralization of such data before analysis may not be desirable, and often not possible due to a variety of real-life constraints such as security, privacy and communication costs. This paper presents a general framework for distributed clustering that takes into account privacy requirements. It is based on building probabilistic models of the data at each local site, whose parameters are then transmitted to a central location. We mathematically show that the best representative of all the local models is a certain “mean” model, and empirically show that this model can be approximated quite well by generating artificial samples from the local models using sampling techniques, and then fitting a global model of a chosen parametric form to these samples. We also propose a new measure that quantifies privacy based on information theoretic concepts, and show that decreasing privacy improves the quality of the global model and vice versa. Empirical results are provided on different kinds of data to highlight the generality of our framework. The results show that high quality global clusters can be achieved with little loss of privacy. [Copyright &y& Elsevier]

Details

Language :
English
ISSN :
01678655
Volume :
26
Issue :
4
Database :
Academic Search Index
Journal :
Pattern Recognition Letters
Publication Type :
Academic Journal
Accession number :
17410573
Full Text :
https://doi.org/10.1016/j.patrec.2004.08.003