1. Direct articulatory observation reveals phoneme recognition performance characteristics of a self-supervised speech model.
- Author
-
Shi X, Feng T, Huang K, Kadiri SR, Lee J, Lu Y, Zhang Y, Goldstein L, and Narayanan S
- Subjects
- Humans, Male, Female, Adult, Magnetic Resonance Imaging, Speech, Speech Perception physiology, Young Adult, Language, Phonetics
- Abstract
Variability in speech pronunciation is widely observed across different linguistic backgrounds, which impacts modern automatic speech recognition performance. Here, we evaluate the performance of a self-supervised speech model in phoneme recognition using direct articulatory evidence. Findings indicate significant differences in phoneme recognition, especially in front vowels, between American English and Indian English speakers. To gain a deeper understanding of these differences, we conduct real-time MRI-based articulatory analysis, revealing distinct velar region patterns during the production of specific front vowels. This underscores the need to deepen the scientific understanding of self-supervised speech model variances to advance robust and inclusive speech technology., (© 2024 Acoustical Society of America.)
- Published
- 2024
- Full Text
- View/download PDF