Back to Search Start Over

Exact memory–constrained UPGMA for large scale speaker clustering.

Authors :
Cumani, Sandro
Laface, Pietro
Source :
Pattern Recognition. Nov2019, Vol. 95, p235-246. 12p.
Publication Year :
2019

Abstract

• We focus on exact hierarchical clustering of large sets of utterances. • Hierarchical clustering is challenging due to memory constraints. • We propose an efficient, exact and parallel implementation of UPGMA clustering. • We extend the Clustering Features concept to speaker recognition scoring functions. • We assess the efficiency of our method on datasets including 4 million utterances. This work focuses on clustering large sets of utterances collected from an unknown number of speakers. Since the number of speakers is unknown, we focus on exact hierarchical agglomerative clustering, followed by automatic selection of the number of clusters. Exact hierarchical clustering of a large number of vectors, however, is a challenging task due to memory constraints, which make it ineffective or unfeasible for large datasets. We propose an exact memory–constrained and parallel implementation of average linkage clustering for large scale datasets, showing that its computational complexity is approximately O (N 2) , but is much faster (up to 40 times in our experiments), than the Reciprocal Nearest Neighbor chain algorithm, which has O (N 2) complexity. We also propose a very fast silhouette computation procedure that, in linear time, determines the set of clusters. The computational efficiency of our approach is demonstrated on datasets including up to 4 million speaker vectors. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00313203
Volume :
95
Database :
Academic Search Index
Journal :
Pattern Recognition
Publication Type :
Academic Journal
Accession number :
137561181
Full Text :
https://doi.org/10.1016/j.patcog.2019.06.018