1. Introducing linguistic constraints into statistical language modeling
- Author
-
P. Geutner
- Subjects
Perplexity ,business.industry ,Computer science ,Speech recognition ,DATA processing & computer science ,Content word ,computer.software_genre ,Linguistics ,Word lists by frequency ,Noun ,Cache language model ,Factored language model ,Language model ,Artificial intelligence ,ddc:004 ,business ,computer ,Natural language ,Natural language processing - Abstract
Building robust stochastic language models is a major issue in speech recognition systems. Conventional word-based n-gram models do not capture any linguistic constraints inherent in speech. In this paper, the notion of function and content words (open/closed word classes) is used to provide linguistic knowledge that can be incorporated into language models. Function words are articles, prepositions and personal pronouns. Content words are nouns, verbs, adjectives and adverbs. Based on this class definition resulting in function and content word markers, a new language model is defined. A combination of the word-based model with this new model is introduced. The combined model shows modest improvements both in perplexity results and recognition performance.
- Published
- 1996