Back to Search
Start Over
Transcribing Southern Min speech corpora with a Web-Based language learning system
- Source :
- International Conference on Audio, Language and Image Processing-ICALIP 2008, International Conference on Audio, Language and Image Processing-ICALIP 2008, Jul 2008, Shangai, China
- Publication Year :
- 2008
- Publisher :
- IEEE, 2008.
-
Abstract
- The paper proposes a human-computation-based scheme for transcribing Southern Min speech corpora. The core idea is to implement a Web-based language learning system to collect orthographic and phonetic labels from a large amount of language learners and choose the commonly input labels as the transcriptions of the corpora. It is essentially a technology of distributed knowledge acquisition. Some computer-aided mechanisms are also used to verify the collected transcriptions. The benefit of the scheme is that it makes the transcribing task neither tedious nor costly. No significant budget should be made for transcribing large corpora. The design of a system for transcribing Min Nan speech corpora is described in detail. The application of a prototype version of the system shows that this transcribing scheme is an effective and economical way to generate orthographic and phonetic transcriptions.
- Subjects :
- Vocabulary
Computer science
speech
media_common.quotation_subject
Speech recognition
corpora
computer.software_genre
01 natural sciences
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
030507 speech-language pathology & audiology
03 medical and health sciences
Transcription (linguistics)
0103 physical sciences
Web application
010301 acoustics
media_common
business.industry
automatic speech recognition
transcribing
Speech corpus
Speech processing
Language acquisition
Knowledge acquisition
ComputingMethodologies_PATTERNRECOGNITION
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Artificial intelligence
transcription
0305 other medical science
business
computer
Natural language
Natural language processing
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2008 International Conference on Audio, Language and Image Processing
- Accession number :
- edsair.doi.dedup.....60e75ab092c573defcdb5a940b6f89e7
- Full Text :
- https://doi.org/10.1109/icalip.2008.4590181