Start Over

Neural network-based method for visual recognition of driver’s voice commands using attention mechanism

Authors :: Alexandr A. Axyonov
Elena V. Ryumina
Dmitry A. Ryumin
Denis V. Ivanko
Alexey A. Karpov
Source :: Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki, Vol 23, Iss 4, Pp 767-775 (2023)
Publication Year :: 2023
Publisher :: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University), 2023.
Abstract: Visual speech recognition or automated lip-reading systems actively apply to speech-to-text translation. Video data proves to be useful in multimodal speech recognition systems, particularly when using acoustic data is difficult or not available at all. The main purpose of this study is to improve driver command recognition by analyzing visual information to reduce touch interaction with various vehicle systems (multimedia and navigation systems, phone calls, etc.) while driving. We propose a method of automated lip-reading the driver’s speech while driving based on a deep neural network of 3DResNet18 architecture. Using neural network architecture with bi-directional LSTM model and attention mechanism allows achieving higher recognition accuracy with a slight decrease in performance. Two different variants of neural network architectures for visual speech recognition are proposed and investigated. When using the first neural network architecture, the result of voice recognition of the driver was 77.68 %, which was lower by 5.78 % than when using the second one the accuracy of which was 83.46 %. Performance of the system which is determined by a real-time indicator RTF in the case of the first neural network architecture is equal to 0.076, and the second — RTF is 0.183 which is more than two times higher. The proposed method was tested on the data of multimodal corpus RUSAVIC recorded in the car. Results of the study can be used in systems of audio-visual speech recognition which is recommended in high noise conditions, for example, when driving a vehicle. In addition, the analysis performed allows us to choose the optimal neural network model of visual speech recognition for subsequent incorporation into the assistive system based on a mobile device.

Subjects :: driver’s voice commands
visual speech recognition
automatic lip reading
machine learning
cnn
lstm
attention mechanisms
Optics. Light
QC350-467
Electronic computers. Computer science
QA75.5-76.95

Details

Language :: English, Russian
ISSN :: 22261494 and 25000373
Volume :: 23
Issue :: 4
Database :: Directory of Open Access Journals
Journal :: Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Publication Type :: Academic Journal
Accession number :: edsdoj.66375603e608485f93cbe94a01929e68
Document Type :: article
Full Text :: https://doi.org/10.17586/2226-1494-2023-23-4-767-775

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Neural network-based method for visual recognition of driver’s voice commands using attention mechanism

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Neural network-based method for visual recognition of driver’s voice commands using attention mechanism

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources