Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Start Over

Audio-visual Speaker Recognition with a Cross-modal Discriminative Network

Authors :: Rohan Kumar Das
Haizhou Li
Ruijie Tao
Source :: INTERSPEECH
Publication Year :: 2020
Abstract: Audio-visual speaker recognition is one of the tasks in the recent 2019 NIST speaker recognition evaluation (SRE). Studies in neuroscience and computer science all point to the fact that vision and auditory neural signals interact in the cognitive process. This motivated us to study a cross-modal network, namely voice-face discriminative network (VFNet) that establishes the general relation between human voice and face. Experiments show that VFNet provides additional speaker discriminative information. With VFNet, we achieve 16.54% equal error rate relative reduction over the score level fusion audio-visual baseline on evaluation set of 2019 NIST SRE.

Subjects :: 0209 industrial biotechnology
Computer science
Speech recognition
Word error rate
Cognition
02 engineering and technology
Speaker recognition
020901 industrial engineering & automation
Discriminative model
Audio and Speech Processing (eess.AS)
Face (geometry)
0202 electrical engineering, electronic engineering, information engineering
FOS: Electrical engineering, electronic engineering, information engineering
NIST
020201 artificial intelligence & image processing
Set (psychology)
Human voice
Electrical Engineering and Systems Science - Audio and Speech Processing

Details

Language :: English
Database :: OpenAIRE
Journal :: INTERSPEECH
Accession number :: edsair.doi.dedup.....9ab0976f663c7a921fc9846c06126ea4

Tools

Email
Cite

Printer

Authors Abstract Subjects Details