1. GJM-2: A Special Case of General Jelinek-Mercer Smoothing Method for Language Modeling Approach to Ad Hoc IR.
- Author
-
Lee, Gary Geunbae, Yamada, Akio, Helen Meng, Sung Hyon Myaeng, Guodong Ding, and Bin Wang
- Abstract
The language modeling approach to IR is attractive and promising because it connects the problem of retrieval with that of language model estimation. A core technique for language model estimation is smoothing, which adjusts the maximum likelihood estimator so as to correct the inaccuracy due to data sparseness. In this paper we propose a General Jelinek-Mercer method (GJM) by using a document-dependent mixture coefficient to control the influence of maximum likelihood model and the collection model. Utilizing the number of unique terms in the document to improve the accuracy of language model estimation, we further develop GJM-2 smoothing method as a special case of GJM. Experimental results show that using GJM-2 for the language modeling approach can achieve better retrieval performances than the existing three popular methods both on short and long queries. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF