Back to Search Start Over

Efficient Distributed Data Condensation for Nearest Neighbor Classification

Authors :
Angiulli Fabrizio
Folino Gianluigi
Source :
Lecture notes in computer science 4641-(2007): 330–339., info:cnr-pdr/source/autori:Angiulli Fabrizio, Folino Gianluigi/titolo:Efficient Distributed Data Condensation for Nearest Neighbor Classification/doi:/rivista:Lecture notes in computer science/anno:2007/pagina_da:330/pagina_a:339/intervallo_pagine:330–339/volume:4641
Publication Year :
2007
Publisher :
Springer, Berlin , Germania, 2007.

Abstract

In this work, PFCNN, a distributed method for computing a consistent subset of very large data sets for the nearest neighbor decision rule is presented. In order to cope with the communication overhead typical of distributed environments and to reduce memory requirements, different variants of the basic PFCNN method are introduced. Experimental results, performed on a class of synthetic datasets revealed that these methods can be profitably applied to enormous collections of data. Indeed, they scale-up well and are efficient in memory consumption and achieve noticeable data reduction and good classification accuracy. To the best of our knowledge, this is the first distributed algorithm for computing a training set consistent subset for the nearest neighbor rule.

Details

Language :
English
Database :
OpenAIRE
Journal :
Lecture notes in computer science 4641-(2007): 330–339., info:cnr-pdr/source/autori:Angiulli Fabrizio, Folino Gianluigi/titolo:Efficient Distributed Data Condensation for Nearest Neighbor Classification/doi:/rivista:Lecture notes in computer science/anno:2007/pagina_da:330/pagina_a:339/intervallo_pagine:330–339/volume:4641
Accession number :
edsair.cnr...........f3419e08cd627814fc20e4bef81dcdc7