Back to Search Start Over

An Innovative Approach Utilizing Binary-View Transformer for Speech Recognition Task.

Authors :
Kamal, Muhammad Babar
Khan, Arfat Ahmad
Khan, Faizan Ahmed
Shahid, Malik Muhammad Ali
Chitapong Wechtaisong
Kamal, Muhammad Daud
Ali, Muhammad Junaid
Peerapong Uthansaku
Source :
Computers, Materials & Continua; 2022, Vol. 72 Issue 3, p5547-5562, 16p
Publication Year :
2022

Abstract

The deep learning advancements have greatly improved the performance of speech recognition systems, and most recent systems are based on the Recurrent Neural Network (RNN). Overall, the RNN works fine with the small sequence data, but suffers from the gradient vanishing problem in case of large sequence. The transformer networks have neutralized this issue and have shown state-of-the-art results on sequential or speech-related data. Generally, in speech recognition, the input audio is converted into an image using Mel-spectrogram to illustrate frequencies and intensities. The image is classified by the machine learning mechanism to generate a classification transcript. However, the audio frequency in the image has low resolution and causing inaccurate predictions. This paper presents a novel end-to-end binary view transformer-based architecture for speech recognition to cope with the frequency resolution problem. Firstly, the input audio signal is transformed into a 2D image using Mel-spectrogram. Secondly, the modified universal transformers utilize the multi-head attention to derive contextual information and derive different speech-related features. Moreover, a feed forward neural network is also deployed for classification. The proposed system has generated robust results on Google's speech command dataset with an accuracy of 95.16% and with minimal loss. The binary-view transformer eradicates the eventuality of the over-fitting problem by deploying a multi view mechanism to diversify the input data, and multi-head attention captures multiple contexts from the data's feature map. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15462218
Volume :
72
Issue :
3
Database :
Complementary Index
Journal :
Computers, Materials & Continua
Publication Type :
Academic Journal
Accession number :
156570946
Full Text :
https://doi.org/10.32604/cmc.2022.024590