Back to Search Start Over

Crossmixed convolutional neural network for digital speech recognition.

Authors :
Diep, Quoc Bao
Phan, Hong Yen
Truong, Thanh-Cong
Source :
PLoS ONE. 4/26/2024, Vol. 19 Issue 4, p1-22. 22p.
Publication Year :
2024

Abstract

Digital speech recognition is a challenging problem that requires the ability to learn complex signal characteristics such as frequency, pitch, intensity, timbre, and melody, which traditional methods often face issues in recognizing. This article introduces three solutions based on convolutional neural networks (CNN) to solve the problem: 1D-CNN is designed to learn directly from digital data; 2DS-CNN and 2DM-CNN have a more complex architecture, transferring raw waveform into transformed images using Fourier transform to learn essential features. Experimental results on four large data sets, containing 30,000 samples for each, show that the three proposed models achieve superior performance compared to well-known models such as GoogLeNet and AlexNet, with the best accuracy of 95.87%, 99.65%, and 99.76%, respectively. With 5-10% higher performance than other models, the proposed solution has demonstrated the ability to effectively learn features, improve recognition accuracy and speed, and open up the potential for broad applications in virtual assistants, medical recording, and voice commands. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
19326203
Volume :
19
Issue :
4
Database :
Academic Search Index
Journal :
PLoS ONE
Publication Type :
Academic Journal
Accession number :
176874212
Full Text :
https://doi.org/10.1371/journal.pone.0302394