Back to Search Start Over

Multi-level Fusion of Audio and Visual Features for Speaker Identification.

Authors :
Zhang, David
Jain, Anil K.
Wu, Zhiyong
Cai, Lianhong
Meng, Helen
Source :
Advances in Biometrics; 2005, p493-499, 7p
Publication Year :
2005

Abstract

This paper explores the fusion of audio and visual evidences through a multi-level hybrid fusion architecture based on dynamic Bayesian network (DBN), which combines model level and decision level fusion to achieve higher performance. In model level fusion, a new audio-visual correlative model (AVCM) based on DBN is proposed, which describes both the inter-correlations and loose timing synchronicity between the audio and video streams. The experiments on the CMU database and our own homegrown database both demonstrate that the methods can improve the accuracies of audio-visual bimodal speaker identification at all levels of acoustic signal-to-noise-ratios (SNR) from 0dB to 30dB with varying acoustic conditions. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783540311119
Database :
Supplemental Index
Journal :
Advances in Biometrics
Publication Type :
Book
Accession number :
32901382
Full Text :
https://doi.org/10.1007/11608288_66