1. Bengali document retrieval using a language modeling approach enhanced by improved cluster-based smoothing.
- Author
-
Chatterjee, Soma and Sarkar, Kamal
- Subjects
- *
LANGUAGE models , *INFORMATION retrieval , *BENGALI language , *ORAL communication , *ALGORITHMS - Abstract
Zero frequency is a fundamental problem in information retrieval using language models and smoothing is applied to deal with this problem. The cluster-based smoothing method is found to be effective for information retrieval using language models. Since the effectiveness of cluster-based smoothing depends on clustering quality, there is scope for improvement by enhancing the clustering algorithm. In this paper, we present a study on how to improve cluster-based smoothing using a histogram-based incremental clustering algorithm and word embeddings. To our knowledge, this is the first study on the cluster-based smoothing method which is integrated with a language model for developing an effective IR system for the Bengali language which is one of the most spoken Indian languages. The proposed method has been tested on two benchmark Bengali IR datasets. The experimental results show that our proposed model for Bengali document retrieval is effective and it outperforms several baseline IR models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF