Back to Search Start Over

Clustering categorical data based on the relational analysis approach and MapReduce

Authors :
Yasmine Lamari
Said Chah Slaoui
Source :
Journal of Big Data, Vol 4, Iss 1, Pp 1-16 (2017)
Publication Year :
2017
Publisher :
SpringerOpen, 2017.

Abstract

The traditional methods of clustering are unable to cope with the exploding volume of data that the world is currently facing. As a solution to this problem, the research is intensified in the direction of parallel clustering methods. Although there is a variety of parallel programming models, the MapReduce paradigm is considered as the most prominent model for problems of large scale data processing of which the clustering. This paper introduces a new parallel design of a recently appeared heuristic for hard clustering using the MapReduce programming model. In this heuristic, clustering is performed by efficiently partitioning categorical large data sets according to the relational analysis approach. The proposed design, called PMR-Transitive, is a single-scan and parameter-free heuristic which determines the number of clusters automatically. The experimental results on real-life and synthetic data sets demonstrate that PMR-Transitive produces good quality results.

Details

Language :
English
ISSN :
21961115
Volume :
4
Issue :
1
Database :
OpenAIRE
Journal :
Journal of Big Data
Accession number :
edsair.doi.dedup.....73a0685f39a7b2ccc69e209cabea5ab0
Full Text :
https://doi.org/10.1186/s40537-017-0090-7