Start Over

An end-to-end continuous Kannada ASR system under uncontrolled environment.

Authors :: Yadava, G. Thimmaraja
Nagaraja, B. G.
Jayanna, H. S.
Source :: Multimedia Tools & Applications; Jan2024, Vol. 83 Issue 3, p7981-7994, 14p
Publication Year :: 2024
Abstract: Achieving better speech recognition accuracy under real-time conditions is still a challenging task, and many researchers are striving to improve accuracy. In this paper, we developed a system for recognizing continuous Kannada speech sentences under real-time conditions. To develop the automatic speech recognition (ASR) models, we used task-specific continuous Kannada speech data gathered from speakers/farmers in real-time conditions. We designed an interactive voice response system (IVRS) and collected 40 continuous Kannada speech sentences. We transcribed, validated, and extracted speech features using the Mel frequency cepstral coefficient (MFCC) technique. We used 90% and 10% of validated continuous Kannada speech data for Kaldi system training and decoding, respectively, at different phoneme levels. The experimental results revealed that the time delay neural networks (TDNN) based ASR models outperformed ASR models of other acoustic modelling techniques and the earlier developed deep neural networks (DNN)-hidden Markov model (HMM) based continuous Kannada ASR (CKASR) system. The least word error rate (WER) ASR models are used in developing the real-time end-to-end (E2E) CKASR system. We verified the developed E2ECKASR system by testing it with 550 speakers/farmers under real-time conditions. [ABSTRACT FROM AUTHOR]