Start Over

Combining Deep Embeddings of Acoustic and Articulatory Features for Speaker Identification

Authors :: Qian-Bei Hong
Chung-Hsien Wu
Chien-Lin Huang
Hsin-Min Wang
Source :: ICASSP
Publication Year :: 2020
Publisher :: IEEE, 2020.
Abstract: In this study, deep embedding of acoustic and articulatory features are combined for speaker identification. First, a convolutional neural network (CNN)-based universal background model (UBM) is constructed to generate acoustic feature (AC) embedding. In addition, as the articulatory features (AFs) represent some important phonological properties during speech production, a multilayer perceptron (MLP)-based AF embedding extraction model is also constructed for AF embedding extraction. The extracted AC and AF embeddings are concatenated as a combined feature vector for speaker identification using a fully-connected neural network. This proposed system was evaluated by three corpora consisting of King-ASR, LibriSpeech and SITW, and the experiments were conducted according to the properties of the datasets. We adopted all three corpora to evaluate the effect of AF embedding, and the results showed that combining AF embedding into the input feature vector improved the performance of speaker identification. The LibriSpeech corpus was used to evaluate the effect of the number of enrolled speakers. The proposed system achieved an EER of 7.80% outperforming the method based on x-vector with PLDA (8.25%). And we further evaluated the effect of signal mismatch using the SITW corpus. The proposed system achieved an EER of 25.19%, which outperformed the other baseline methods.

Subjects :: Speech production
Artificial neural network
Computer science
Speech recognition
Feature vector
02 engineering and technology
Convolutional neural network
Signal
030507 speech-language pathology & audiology
03 medical and health sciences
Feature (computer vision)
Multilayer perceptron
0202 electrical engineering, electronic engineering, information engineering
Embedding
020201 artificial intelligence & image processing
0305 other medical science

Details

Database :: OpenAIRE
Journal :: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Accession number :: edsair.doi...........d30375d2531548332ca9115eefabb3d7
Full Text :: https://doi.org/10.1109/icassp40776.2020.9053640

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Combining Deep Embeddings of Acoustic and Articulatory Features for Speaker Identification

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Combining Deep Embeddings of Acoustic and Articulatory Features for Speaker Identification

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources