Back to Search Start Over

Speech recognition using visual cues with a two stage detector network for visemes classification and sentence detection.

Authors :
Sangeetha, R.
Malathi, D.
Source :
AIP Conference Proceedings. 2024, Vol. 3075 Issue 1, p1-7. 7p.
Publication Year :
2024

Abstract

In recent times automated lip reading has gained more research attention. Much advancement are being developed in the area using various deep learning algorithms. Automated lip reading can be performed with or without audio. When lip movements are identified without sound of speech it is often referred as visual speech recognition. One of the drawbacks in visual speech recognition is the detecting words that have similar lip movement. Visemes are often referred as speech sounds that look same for different words that have similar lip movement. In this paper an Inception encoder decoder network is proposed for visual lip-reading. The model is developed to detect the lip-read sentences from a variety of vocabulary and other sentences that are not included in model training. In proposed method visemes are classified and used as a classification method for detecting lip reading sentences. Detected visemes are converted to sentences using perplexity analysis. Proposed method is lexicon free and totally based on visual cues. Visemes are used. The model has been experimented on Lip Reading Sentences 3 TED benchmark dataset containing various challenging videos from TED and TEDX talks. Besides the LRS-3 dataset additionally various experiments have been performed with videos of varying illumination. Proposed model achieved state of art results compared with current lip-reading models. Experimental results show that developed model performs significantly with 13% lower error rate with more robustness to varying illumination. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0094243X
Volume :
3075
Issue :
1
Database :
Academic Search Index
Journal :
AIP Conference Proceedings
Publication Type :
Conference
Accession number :
178685931
Full Text :
https://doi.org/10.1063/5.0217205