Back to Search Start Over

Turkish speech corpora and recognition tools developed by porting SONIC: Towards multilingual speech recognition

Authors :
Salor, Özgül
Pellom, Bryan L.
Ciloglu, Tolga
Demirekler, Mübeccel
Source :
Computer Speech & Language. Oct2007, Vol. 21 Issue 4, p580-593. 14p.
Publication Year :
2007

Abstract

This paper presents work on developing speech corpora and recognition tools for Turkish by porting SONIC, a speech recognition tool developed initially for English at the Center for Spoken Language Research of the University of Colorado at Boulder. The work presented in this paper had two objectives: The first one is to collect a standard phonetically-balanced Turkish microphone speech corpus for general research use. A 193-speaker triphone-balanced audio corpus and a pronunciation lexicon for Turkish have been developed. The corpus has been accepted for distribution by the Linguistic Data Consortium (LDC) of the University of Pennsylvania in October 2005, and it will serve as a standard corpus for Turkish speech researchers. The second objective was to develop speech recognition tools (a phonetic aligner and a phone recognizer) for Turkish, which provided a starting point for obtaining a multilingual speech recognizer by porting SONIC to Turkish. This part of the work was the first port of this particular recognizer to a language other than English; subsequently, SONIC has been ported to over 15 languages. Using the phonetic aligner developed, the audio corpus has been provided with word, phone and HMM-state level alignments. For the phonetic aligner, it is shown that 92.6% of the automatically labeled phone boundaries are placed within 20ms of manually labeled locations for the Turkish audio corpus. Finally, a phone recognition error rate of 29.2% is demonstrated for the phone recognizer. [Copyright &y& Elsevier]

Details

Language :
English
ISSN :
08852308
Volume :
21
Issue :
4
Database :
Academic Search Index
Journal :
Computer Speech & Language
Publication Type :
Academic Journal
Accession number :
25316088
Full Text :
https://doi.org/10.1016/j.csl.2007.01.001