Back to Search
Start Over
Self-organizing speech recognition that processes acoustic and articulatory features.
- Source :
- Multimedia Tools & Applications; Apr2024, Vol. 83 Issue 13, p39169-39195, 27p
- Publication Year :
- 2024
-
Abstract
- In automatic speech recognition (ASR) systems, the minimization of noxious effects caused by different background noises between training and operating situations has been a challenging task for many years. An ASR robust to noise that can deal with different types of speeches and various speakers still is an open research point. Typically, conventional ASR models for missing-feature reconstructions and robust speech descriptors employ acoustic features and statistical methods. In spite of improved performance in dealing with noise, such methods still degrade the performance when different background noises co-exist with the main signal. More recent approaches use neural networks, particularly deep learning models, for ASR purposes. Such models increase performance at the high training cost. In order to mitigate such limitations, we proposed an ASR model called Self-Organizing Speech Recognizer (SOSR). Unlike most conventional ASRs, SOSR is characterized by using acoustic and articulatory features, employing unsupervised and incremental learning, and is suitable for real-time applications due to its quick training stage. SOSR simultaneously processes an audio signal in a two-branch. In the first path, the acoustic features are extracted from the original signal whereas in the second path an acoustic-to-articulatory inversion is performed by several Self-organizing Maps. The signal from both paths is delivered to a Self-organizing Map with a time-varying structure, which is responsible for recognizing the input speech signal. Four datasets (TIMIT, Aurora 2, Aurora 4, and CHIME 2) were used for SOSR assessment. The Word Error Rate (WER) was the chosen metric to compare the experimental results of the tests with different noise levels and signal variations. Hence, the experimental results suggest that SOSR can learn quickly, and it can handle noisy signals, various speakers, different types of speeches, and assorted lengths of utterances. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 13807501
- Volume :
- 83
- Issue :
- 13
- Database :
- Complementary Index
- Journal :
- Multimedia Tools & Applications
- Publication Type :
- Academic Journal
- Accession number :
- 176408765
- Full Text :
- https://doi.org/10.1007/s11042-023-17080-4