Back to Search
Start Over
Minkowski Distances and Standardisation for Clustering and Classification on High-Dimensional Data
- Source :
- Advanced Studies in Behaviormetrics and Data Science ISBN: 9789811526992
- Publication Year :
- 2020
- Publisher :
- Imaizumi, Tadashi; Nakayama, Atsuho; Yokoyama, Satoru, 2020.
-
Abstract
- There are many distance-based methods for classification and clustering, and for data with a high number of dimensions and a lower number of observations, processing distances is computationally advantageous compared to the raw data matrix. Euclidean distances are used as a default for continuous multivariate data, but there are alternatives. Here the so-called Minkowski distances, \(L_1\) (city block)-, \(L_2\) (Euclidean)-, \(L_3\), \(L_4\)- and maximum distances are combined with different schemes of standardisation of the variables before aggregating them. Boxplot transformation is proposed, a new transformation method for a single variable that standardises the majority of observations but brings outliers closer to the main bulk of the data. Distances are compared in simulations for clustering by partitioning around medoids, complete and average linkage, and classification by nearest neighbours, of data with a low number of observations but high dimensionality. The \(L_1\)-distance and the boxplot transformation show good results.
- Subjects :
- Clustering high-dimensional data
City block
02 engineering and technology
01 natural sciences
Medoid
High-dimensional classification and clustering, Minkowski distances, standardisation, boxplot transformation
Combinatorics
010104 statistics & probability
Matrix (mathematics)
Transformation (function)
Outlier
Minkowski space
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
0101 mathematics
Cluster analysis
Mathematics
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Advanced Studies in Behaviormetrics and Data Science ISBN: 9789811526992
- Accession number :
- edsair.doi.dedup.....82e132ab695586d7dd1b76e4101e117a