1. Doppler Radar-Based Human Speech Recognition Using Mobile Vision Transformer.
- Author
-
Li, Wei, Geng, Yongfu, Gao, Yang, Ding, Qining, Li, Dandan, Liu, Nanqi, and Chen, Jinheng
- Subjects
SPEECH perception ,IMAGE recognition (Computer vision) ,DOPPLER radar ,CONVOLUTIONAL neural networks ,VOCAL cords ,HUMAN-computer interaction - Abstract
As one of the important vital features of the human body, the acquisition of a speech signal plays an important role in human–computer interaction. In this study, voice sounds are gathered and identified using Doppler radar. The skin on the neck vibrates when a person speaks, which causes the vocal cords to vibrate as well. The vibration signal received by the radar will produce a unique micro-Doppler signal according to words with different pronunciations. Following the conversion of these signals into micro-Doppler feature maps, these speech signal maps are categorized and identified. The speech recognition method used in this paper is on neural networks. CNN convolutional neural networks have a lower generalization and accuracy when there are insufficient training samples and sample extraction bias, and the training model is not suitable for use on mobile terminals. MobileViT is a lightweight transformers-based model that can be used for image classification tasks. MobileViT uses a lightweight attention mechanism to extract features with a faster inference speed and smaller model size while ensuring a higher accuracy. Our proposed method does not require large-scale data collection, which is beneficial for different users. In addition, the learning speed is relatively fast, with an accuracy of 99.5%. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF