1. A distributed unsupervised learning algorithm and its suitability to physical based observation.
- Author
-
Hes, Radek and Gioroli, Giacomo
- Subjects
MACHINE learning ,K-means clustering ,RANDOM noise theory ,PRIOR learning ,POINT set theory - Abstract
Large datasets pose a difficult challenge for clustering algorithms due to memory limitations and execution speed. Clustering is typically addressed with current popular techniques: K-Means and DBScan, which are inherently tightly coupled to all points in the data set. K-Means clustering is based on cluster centres and requires prior knowledge of the number of classes present in the dataset. DBScan relaxes this constraint but retains the need for a complete dataset during computation. In this paper, a novel 'self'-learning primitive unsupervised technique is presented that addresses the tight coupling and is readily distributable. The technique follows the comparison to class averages similar to K-Means yet relaxes the constraint of prior knowledge of the number of classes, similar to DBScan. The algorithm competes well with the standardised K-Means and DBScan variants in the context of physically based observations where Gaussian noise can be presumed. An application of usage of the unsupervised technique is presented; the classification of unknown whale species in the cook strait of New Zealand is shown to perform well. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF