Start Over

Doppler Radar-Based Human Speech Recognition Using Mobile Vision Transformer.

Authors :: Li, Wei
Geng, Yongfu
Gao, Yang
Ding, Qining
Li, Dandan
Liu, Nanqi
Chen, Jinheng
Source :: Electronics (2079-9292); Jul2023, Vol. 12 Issue 13, p2874, 11p
Publication Year :: 2023
Abstract: As one of the important vital features of the human body, the acquisition of a speech signal plays an important role in human–computer interaction. In this study, voice sounds are gathered and identified using Doppler radar. The skin on the neck vibrates when a person speaks, which causes the vocal cords to vibrate as well. The vibration signal received by the radar will produce a unique micro-Doppler signal according to words with different pronunciations. Following the conversion of these signals into micro-Doppler feature maps, these speech signal maps are categorized and identified. The speech recognition method used in this paper is on neural networks. CNN convolutional neural networks have a lower generalization and accuracy when there are insufficient training samples and sample extraction bias, and the training model is not suitable for use on mobile terminals. MobileViT is a lightweight transformers-based model that can be used for image classification tasks. MobileViT uses a lightweight attention mechanism to extract features with a faster inference speed and smaller model size while ensuring a higher accuracy. Our proposed method does not require large-scale data collection, which is beneficial for different users. In addition, the learning speed is relatively fast, with an accuracy of 99.5%. [ABSTRACT FROM AUTHOR]