1. 基于图神经网络与多特征融合的说话人验证模型.
- Author
-
曹嘉玲 and 陈 宁
- Abstract
Recent research shows that features extracted from pre-trained models trained on large unlabeled speech samples have excelled in SV tasks. However, the existing models cannot effectively optimize and aggregate frame-level features by using the topological structure characteristics between frame-level features, and the high network complexity is not conducive to real-time performance. At the same time, the existing models cannot make full use of complementarity between multiple input features to further improve the performance of the model. To this end, on the one hand, this paper introduces graph neural networks to optimize frame-level features by using the topological structure between frame-level features. On the other hand, this paper construct a multi-feature fusion mechanism based on multiple losses to make full use of the complementarity between different features to further improves the performance of the model. Experimental results on VoxCeleb show that the proposed model GACNPF achieves lower error rates and time complexity compared to existing models. More importantly, the model has good flexibility. It can fuse any kind of features. This paper can apply it to other classification tasks based on pre-trained feature extraction. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF