Back to Search Start Over

Uneven success: automatic speech recognition and ethnicity-related dialects.

Authors :
Wassink, Alicia Beckford
Gansen, Cady
Bartholomew, Isabel
Source :
Speech Communication. May2022, Vol. 140, p50-70. 21p.
Publication Year :
2022

Abstract

• Comparison of accuracy of the Microsoft Speech Services conversational ASR system finds that phonetic error rates are higher for speech samples of nonwhite, than for white, speakers • Sociophonetic variables account for 20% of ASR errors • CLOx allows automated transcription of conversational speech in one-fifth of the time required to manually produce an orthographic transcription • Steady improvements to ASR systems greatly expedite and simplify the task of sociolinguistic data analysis Addressing racial bias in automatic speech recognition is an area of concern in fields associated with human-computer interaction. Research to date suggests that sociolinguistic variation, namely systematic sources of sociophonetic variation, has yet to be extensively exploited in acoustic model architectures. This paper reports a study that evaluates the performance of one ASR system for a multi-ethnic sample of speakers from the American Pacific Northwest (including Native American, African American, European American and ChicanX speakers). Using a sociophonetic approach to characterizing vocalic and consonantal variation, we ask which dialect features appear to be most challenging for our ASR system. We also ask which error types are particular to the four ethnic dialects sampled. Recordings of both conversational and read speech were coded for a common set of 18 sociophonetic variables with distinct phonetic profiles. Automatic transcription was achieved using CLOx , a custom-built ASR system created for sociolinguistic analysis. Normalized error frequency rates were compared across ethnic samples to evaluate CLOx performance. N f error rates demonstrate clear differential performance in the ASR system, pointing to racial bias in system output. Specific predictions are made regarding approaches that might be taken to leverage sociophonetic knowledge to improve social dialect-recognition accuracy in ASR systems. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01676393
Volume :
140
Database :
Academic Search Index
Journal :
Speech Communication
Publication Type :
Academic Journal
Accession number :
156895070
Full Text :
https://doi.org/10.1016/j.specom.2022.03.009