1. Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
- Author
-
Hemant Palivela, Meera Narvekar, David Asirvatham, Shashi Bhushan, Vinay Rishiwal, and Udit Agarwal
- Subjects
Code-switching ,automatic speech recognition ,low-resource languages ,Hindi ,Marathi ,Indic languages ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This work examines the development of Automatic Speech Recognition (ASR) systems for low-resource languages, focusing on Hindi and Marathi, particularly in multilingual and code-switching environments. ASR systems, which convert spoken language into text, face significant challenges when applied to low-resource languages with limited data for training models. These challenges are exacerbated in multilingual settings, particularly during code-switching, where speakers alternate between languages within a conversation. This paper underscores the current state of ASR for Indic languages, highlighting linguistic complexities such as diverse sentence structures, phonetic variety, and frequent code-switching. Code-switching introduces additional challenges, as ASR systems must rapidly identify language boundaries and adapt to linguistic shifts. Present systems struggle to perform adequately with code-switched data due to the complexity of phonetic structures and the lack of comprehensive, annotated speech corpora. This work critically evaluates current methods and proposes improvements using modern deep-learning techniques to address the primary challenges in developing efficient ASR models for Hindi and Marathi. Moreover, performance comparisons of monolingual, bilingual, and multilingual ASR systems indicate that multilingual approaches are more effective in managing linguistic diversity. The efficacy of these systems can be evaluated using performance metrics such as the Phoneme Error Rate (PER) and the Word Error Rate (WER), which assess word recognition accuracy.
- Published
- 2025
- Full Text
- View/download PDF