Back to Search
Start Over
Audio-visual graphical models for speech processing
- Source :
- ICASSP (5)
- Publication Year :
- 2004
- Publisher :
- IEEE, 2004.
-
Abstract
- Perceiving sounds in a noisy environment is a challenging problem. Visual lip-reading can provide relevant information but is also challenging because lips are moving and a tracker must deal with a variety of conditions. Typically audio-visual systems have been assembled from individually engineered modules. We propose to fuse audio and video in a probabilistic generative model that implements cross-model self-supervised learning, enabling adaptation to audio-visual data. The video model features a Gaussian mixture model embedded in a linear subspace of a sprite which translates in the video. The system can learn to detect and enhance speech in noise given only a short (30 second) sequence of audio-visual data. We show some results for speech detection and enhancement, and discuss extensions to the model that are under investigation.
- Subjects :
- Sprite (computer graphics)
Voice activity detection
business.industry
Computer science
Speech recognition
Feature extraction
Speech processing
Mixture model
Adaptive filter
Speech enhancement
Noise
Acoustical engineering
Video tracking
Computer vision
Artificial intelligence
Graphical model
business
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
- Accession number :
- edsair.doi...........929ed6515aaa2646e6625807b2a2589c