ck-means, a novel unsupervised learning method that combines fuzzy and crispy clustering methods to extract intersecting data

Authors :: Dessureault, Jean-Sébastien
Massicotte, Daniel
Publication Year :: 2022
Abstract: Clustering data is a popular feature in the field of unsupervised machine learning. Most algorithms aim to find the best method to extract consistent clusters of data, but very few of them intend to cluster data that share the same intersections between two features or more. This paper proposes a method to do so. The main idea of this novel method is to generate fuzzy clusters of data using a Fuzzy C-Means (FCM) algorithm. The second part involves applying a filter that selects a range of minimum and maximum membership values, emphasizing the border data. A {\mu} parameter defines the amplitude of this range. It finally applies a k-means algorithm using the membership values generated by the FCM. Naturally, the data having similar membership values will regroup in a new crispy cluster. The algorithm is also able to find the optimal number of clusters for the FCM and the k-means algorithm, according to the consistency of the clusters given by the Silhouette Index (SI). The result is a list of data and clusters that regroup data sharing the same intersection, intersecting two features or more. ck-means allows extracting the very similar data that does not naturally fall in the same cluster but at the intersection of two clusters or more. The algorithm also always finds itself the optimal number of clusters.

Tools