Start Over

Relational Analysis for Clustering Consensus

Authors :: Hamid Benhadda
Mustapha Lebbah
Younès Bennani
Nistor Grozavu
Source :: Machine Learning
Publication Year :: 2021
Publisher :: IntechOpen, 2021.
Abstract: One of the most used techniques among many others in the data mining field is the clustering. The aim of this technique is to synthetize and summarize huge amounts of data by splitting it into small and homogenous clusters such that the data (observations) inside the same cluster are more similar to each other than to the observations inside the other clusters. This definition assumes that there exists a well defined clustering quality measure that quantifies how much homogeneous are the obtained clusters. The aim of this chapter is to expose an original approach to merge different partitions, related to the same data set, which are obtained either by applying different clustering techniques either by the same clustering technique with different parameters. Fusing partitions has been broadly studied and has been given several names, depending on different scientific fields, like machine learning or bioinformatics (Dudoit & Fridlyand, 2003; Kim & Lee, 2007; Monti et al., 2003). Among these names we can quote: consensus clustering, clustering aggregation, clustering combination, fusion of clustering, ...etc. Several studies (Frossyniotis et al., 2002; Minaei-Bidgoli et al., 2004; Strehl & Ghosh, 2002; Topchy et al., 2004; 2005) have pioneered clustering data sets as a new branch of the conventional clustering methodology. In (Topchy et al., 2004) the authors propose a probabilistic formalism of clustering concensus using a finite mixture of multinomial distributions in a space of clustering. The approach proposed in (Frossyniotis et al., 2002) is designed for combining runs of clustering algorithms with the same number of clusters. In (Strehl & Ghosh, 2002) the authors proposed combiners based on a hyper-graph model to solve the cluster fusion problem. The authors discuss two manners of consensus clustering: (1) Feature Distributed Clustering (FDC): a set of clustering are obtained from partial view of variables using all observations (2) Object-Distributed Clustering (ODC): with this technique the ensemble clustering has limited to subset of observation with access to all variables. The 3