Back to Search Start Over

Low Dimensional Representation of Space Structure and Clustering of Categorical Data

Authors :
Qibin Zheng
Jianjun Cao
Nianfeng Weng
Xingchun Diao
Source :
ISPA/IUCC/BDCloud/SocialCom/SustainCom
Publication Year :
2018
Publisher :
IEEE, 2018.

Abstract

Dissimilarity measurement plays a key role in clustering analysis. Due to the lack of order relation between categorical values, the clustering of categorical data is harder than that of numerical data. To improve the clustering quality of categorical data, SBC (space structure based clustering) algorithm proposed a novel representation scheme for the space structures of them. The representation scheme improved the discriminability of categorical data, while caused problems either: low-efficiency and high-dimensionality. In this work, we prove that it is possible to represent categorical data with the space structure more efficiently while maintaining the same clustering performance. To achieve that, a fraction of representative objects is selected as the reference set, with which a low-dimensional space structure matrix would be built. Since the reference set directly affect the dissimilarity measure, a cluster-based method is proposed to get better reference set. The theoretical and experimental proofs show that, compared with SBC method, the proposed methods are more efficient and extendable maintaining the approximately same clustering performance.

Details

Database :
OpenAIRE
Journal :
2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)
Accession number :
edsair.doi...........8606111e6f9ccfd5111560880f84dbee
Full Text :
https://doi.org/10.1109/bdcloud.2018.00161