Back to Search
Start Over
Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data
- Source :
- Advances in Data Analysis and Classification. 15:407-436
- Publication Year :
- 2020
- Publisher :
- Springer Science and Business Media LLC, 2020.
-
Abstract
- Symbolic data is aggregated from bigger traditional datasets in order to hide entry specific details and to enable analysing large amounts of data, like big data, which would otherwise not be possible. Symbolic data may appear in many different but complex forms like intervals and histograms. Identifying patterns and finding similarities between objects is one of the most fundamental tasks of data mining. In order to accurately cluster these sophisticated data types, usual methods are not enough. Throughout the years different approaches have been proposed but they mainly concentrate on the “macroscopic” similarities between objects. Distributional data, for example symbolic data, has been aggregated from sets of large data and thus even the smallest microscopic differences and similarities become extremely important. In this paper a method is proposed for clustering distributional data based on these microscopic similarities by using quantile values. Having multiple points for comparison enables to identify similarities in small sections of distribution while producing more adequate hierarchical concepts. Proposed algorithm, called microscopic hierarchical conceptual clustering, has a monotone property and has been found to produce more adequate conceptual clusters during experimentation. Furthermore, thanks to the usage of quantiles, this algorithm allows us to compare different types of symbolic data easily without any additional complexity.
- Subjects :
- Statistics and Probability
Property (programming)
Computer science
business.industry
Applied Mathematics
Big data
Conceptual clustering
02 engineering and technology
computer.software_genre
01 natural sciences
Data type
Computer Science Applications
010104 statistics & probability
Monotone polygon
Histogram
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Data mining
0101 mathematics
Cluster analysis
business
computer
Quantile
Subjects
Details
- ISSN :
- 18625355 and 18625347
- Volume :
- 15
- Database :
- OpenAIRE
- Journal :
- Advances in Data Analysis and Classification
- Accession number :
- edsair.doi...........5cb64d5bd0361f8cc4f42fe115442747
- Full Text :
- https://doi.org/10.1007/s11634-020-00411-w