Back to Search
Start Over
Clustering categorical data based on the relational analysis approach and MapReduce
- Source :
- Journal of Big Data, Vol 4, Iss 1, Pp 1-16 (2017)
- Publication Year :
- 2017
- Publisher :
- SpringerOpen, 2017.
-
Abstract
- The traditional methods of clustering are unable to cope with the exploding volume of data that the world is currently facing. As a solution to this problem, the research is intensified in the direction of parallel clustering methods. Although there is a variety of parallel programming models, the MapReduce paradigm is considered as the most prominent model for problems of large scale data processing of which the clustering. This paper introduces a new parallel design of a recently appeared heuristic for hard clustering using the MapReduce programming model. In this heuristic, clustering is performed by efficiently partitioning categorical large data sets according to the relational analysis approach. The proposed design, called PMR-Transitive, is a single-scan and parameter-free heuristic which determines the number of clusters automatically. The experimental results on real-life and synthetic data sets demonstrate that PMR-Transitive produces good quality results.
- Subjects :
- Clustering high-dimensional data
Information Systems and Management
lcsh:Computer engineering. Computer hardware
Computer Networks and Communications
Computer science
Conceptual clustering
lcsh:TK7885-7895
02 engineering and technology
computer.software_genre
Clustering
lcsh:QA75.5-76.95
Consensus clustering
0202 electrical engineering, electronic engineering, information engineering
Relational analysis approach
MapReduce
Cluster analysis
Categorical variable
Categorical data
Brown clustering
lcsh:T58.5-58.64
lcsh:Information technology
05 social sciences
Data stream clustering
Hardware and Architecture
Canopy clustering algorithm
020201 artificial intelligence & image processing
Data mining
lcsh:Electronic computers. Computer science
0509 other social sciences
050904 information & library sciences
computer
Information Systems
Subjects
Details
- Language :
- English
- ISSN :
- 21961115
- Volume :
- 4
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- Journal of Big Data
- Accession number :
- edsair.doi.dedup.....73a0685f39a7b2ccc69e209cabea5ab0
- Full Text :
- https://doi.org/10.1186/s40537-017-0090-7