Back to Search
Start Over
Exact memory–constrained UPGMA for large scale speaker clustering.
- Source :
-
Pattern Recognition . Nov2019, Vol. 95, p235-246. 12p. - Publication Year :
- 2019
-
Abstract
- • We focus on exact hierarchical clustering of large sets of utterances. • Hierarchical clustering is challenging due to memory constraints. • We propose an efficient, exact and parallel implementation of UPGMA clustering. • We extend the Clustering Features concept to speaker recognition scoring functions. • We assess the efficiency of our method on datasets including 4 million utterances. This work focuses on clustering large sets of utterances collected from an unknown number of speakers. Since the number of speakers is unknown, we focus on exact hierarchical agglomerative clustering, followed by automatic selection of the number of clusters. Exact hierarchical clustering of a large number of vectors, however, is a challenging task due to memory constraints, which make it ineffective or unfeasible for large datasets. We propose an exact memory–constrained and parallel implementation of average linkage clustering for large scale datasets, showing that its computational complexity is approximately O (N 2) , but is much faster (up to 40 times in our experiments), than the Reciprocal Nearest Neighbor chain algorithm, which has O (N 2) complexity. We also propose a very fast silhouette computation procedure that, in linear time, determines the set of clusters. The computational efficiency of our approach is demonstrated on datasets including up to 4 million speaker vectors. [ABSTRACT FROM AUTHOR]
- Subjects :
- *COMPUTATIONAL complexity
*SILHOUETTES
*MEMORY
Subjects
Details
- Language :
- English
- ISSN :
- 00313203
- Volume :
- 95
- Database :
- Academic Search Index
- Journal :
- Pattern Recognition
- Publication Type :
- Academic Journal
- Accession number :
- 137561181
- Full Text :
- https://doi.org/10.1016/j.patcog.2019.06.018