Back to Search Start Over

Towards precise and robust automatic synchronization of live speech and its transcripts

Authors :
Gao, Jie
Zhao, Qingwei
Yan, Yonghong
Source :
Speech Communication. Apr2011, Vol. 53 Issue 4, p508-523. 16p.
Publication Year :
2011

Abstract

Abstract: This paper presents our efforts in automatically synchronizing spoken utterances with their transcripts (textual contents) (ASUT), where the speech is a live stream and its corresponding transcripts are known. This task is first simplified to the problem of online detecting the end times of spoken utterances and then a solution based on a novel frame-synchronous likelihood ratio test (FSLRT) procedure is proposed. We detail the formulation and implementation of the proposed FSLRT procedure under the Hidden Markov Models (HMMs) framework, and we study its property and parameter settings empirically. Because synchronization failures may occur in the FSLRT-based AUST systems, this paper also extends the FSLRT procedure to its multiple-instance version to increase the robustness of the system. The proposed multiple-instance FSLRT can detect the synchronization failures and restart the system from an appropriate point. Therefore a fully automatic FSLRT-based ASUT system could be constructed. The FSLRT-based ASUT system is evaluated in a simultaneous broadcasting news subtitling task. Experimental results show that the proposed method achieves satisfying performance and it outperforms an automatic speech recognition-based method both in terms of robustness and precision. Finally, the FSLRT-based news subtitling system can correctly subtitle about 90% of the sentences with an average time deviation of about 100ms, running at the speed of 0.37 real time (RT). [Copyright &y& Elsevier]

Details

Language :
English
ISSN :
01676393
Volume :
53
Issue :
4
Database :
Academic Search Index
Journal :
Speech Communication
Publication Type :
Academic Journal
Accession number :
59169476
Full Text :
https://doi.org/10.1016/j.specom.2011.01.001