Back to Search
Start Over
Low Dimensional Representation of Space Structure and Clustering of Categorical Data
- Source :
- ISPA/IUCC/BDCloud/SocialCom/SustainCom
- Publication Year :
- 2018
- Publisher :
- IEEE, 2018.
-
Abstract
- Dissimilarity measurement plays a key role in clustering analysis. Due to the lack of order relation between categorical values, the clustering of categorical data is harder than that of numerical data. To improve the clustering quality of categorical data, SBC (space structure based clustering) algorithm proposed a novel representation scheme for the space structures of them. The representation scheme improved the discriminability of categorical data, while caused problems either: low-efficiency and high-dimensionality. In this work, we prove that it is possible to represent categorical data with the space structure more efficiently while maintaining the same clustering performance. To achieve that, a fraction of representative objects is selected as the reference set, with which a low-dimensional space structure matrix would be built. Since the reference set directly affect the dissimilarity measure, a cluster-based method is proposed to get better reference set. The theoretical and experimental proofs show that, compared with SBC method, the proposed methods are more efficient and extendable maintaining the approximately same clustering performance.
- Subjects :
- 0301 basic medicine
Structure (mathematical logic)
Relation (database)
business.industry
Computer science
Dimensionality reduction
Pattern recognition
02 engineering and technology
Measure (mathematics)
Set (abstract data type)
03 medical and health sciences
ComputingMethodologies_PATTERNRECOGNITION
030104 developmental biology
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Artificial intelligence
business
Representation (mathematics)
Cluster analysis
Categorical variable
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)
- Accession number :
- edsair.doi...........8606111e6f9ccfd5111560880f84dbee
- Full Text :
- https://doi.org/10.1109/bdcloud.2018.00161