51. TFW-PM reference for subjectively assessing the quality of synthetic speech
- Author
-
Toshiro Watanabe
- Subjects
Engineering ,Computer Networks and Communications ,business.industry ,Speech recognition ,Mean opinion score ,Set (abstract data type) ,Noise ,Quality (physics) ,Minimum deviation ,Electrical and Electronic Engineering ,Image warping ,business ,Phase modulation ,Variable (mathematics) - Abstract
This paper proposes a set of speech samples generated by time and frequency warping based on phase modulation (TFW-PM) for use as references in subjective assessments of synthetic speech, and describes its effectiveness. Their qualities depend on three variable parameters: the number of PARCOR synthesis coefficients, the modulation factor, and the frame length. Repeated assessment tests for four kinds of synthetic speech using the TFW-PM reference generated by combining those parameters showed that the modulation factor (Δθ) had the largest effect on the TFW-PM speech quality. Each optimum value corresponding to the other two parameters was determined based on the minimum deviation criterion of the equivalent Δθ converted from the mean opinion score (MOS) of the synthetic speech via the relationship between Δθ and MOS of the reference. In addition, TFW-PM speech samples and modulated noise reference (MNR) were examined to see whether or not synthetic speech qualities in terms of equivalent Δθ were stable from three viewpoints: repeated testing, subjects divided into groups familiar with and unfamiliar with synthetic speech, and listening environments. The results show that the equivalent Δθ is more stable for all three criteria than the customary equivalent Q. © 1998 Scripta Technica. Electron Comm Jpn Pt 1, 81(1): 59–67, 1998
- Published
- 1998