Back to Search Start Over

A model for the assessor bias in second language pronunciation assessment

Authors :
Lopez Saenz, Jose Antonio
Hain, Thomas
Publication Year :
2023
Publisher :
University of Sheffield, 2023.

Abstract

In pronunciation assessment (PA) of second language (L2) speech, it is known that similarity to a native accent is desired, yet not crucial. There are certain variations in pronunciation which do not interfere with communication. It is up to the listener to decide whether a pronunciation differs from the one of so-called canonical reference. The subjectivity in pronunciation assessment can be referred to as the assessor bias. A computer-assisted pronunciation assessment is subject to the effects of assessor bias. The disagreement between assessors causes inconsistencies in the data used to build models for the assessment task. A model for the bias itself, however, would help build a general reference for a proficient L2 speaker as well as an impartial PA. This thesis proposes a model for the assessor bias to be included as part of a model for a pronunciation assessor. The assessor model consists of an ideal assessor-independent scoring function for PA, modified by an additive term specific to the assessor. The latter term is referred to as bias. The research for the model resulted in four original contributions. All contributions were tested on data from L2 speech from young learners of English in the Netherlands. Each recording was annotated for mispronunciation at the phoneme level by three trained phoneticians. Overlapping annotation made the data the best fit for a consistent model of inter-assessor disagreement. A first contribution is a novel approach for detecting mispronunciations without the need for a precise phoneme alignment, which outperformed a baseline of pronunciation correctness scores based on phoneme alignments. The second contribution is a study of the effect of speaker metadata on learning a pronunciation reference. Models trained on different assessors were proven to be sensitive to different speaker information. The third contribution was the proposal and implementation of the assessor model. Two deep networks combine a bidirectional long short-term memory module with self-attention and a feed-forward classifier to estimate the probabilities of phonemes being pronounced correctly. Both networks were trained jointly to estimate the observed pronunciation labels. Only one network was modelled on the assessor's identity. The fourth contribution consists of methods for increasing the specialisation of the bias networks by reducing its cosine similarity and co-dependence with respect to the assessor-independent network. Using cosine similarity and a contrastive log-ratio upper bound for mutual information, it was possible to both reduce the correlation and dependency between the two networks. The bias network managed to increase its dependence on assessor identity and speaker factors. The mutual information between the assessor and the bias output was useful to illustrate disagreement, as well as which assessors and phonemes were the most prone to the bias.

Details

Language :
English
Database :
British Library EThOS
Publication Type :
Dissertation/ Thesis
Accession number :
edsble.879590
Document Type :
Electronic Thesis or Dissertation