Back to Search
Start Over
Likelihood corpus distribution: an efficient topic modelling scheme for Bengali document class identification.
- Source :
-
Sādhanā: Academy Proceedings in Engineering Sciences . Sep2024, Vol. 49 Issue 3, p1-19. 19p. - Publication Year :
- 2024
-
Abstract
- The learning quality of humans depends on the sense of contemplation. Textual documents are a huge part of the literature on contemplation which effortlessly creates perception. Automatic document class identification or organisation is a machine learning function to understand the psychological and emotional content of the text in a concise way. The problem of identification of documents falls in the field of library science, information science and artificial intelligence. The research progress of class identification of documents has been made in various most spoken languages. Numerous research works have been published in European and Asian languages. However, there is a gap in the literature when it comes to any less resource language, especially Bengali. Consequently, this work portrays an efficient topic modelling approach for Bengali document class identification. It proposes a Dirichlet-polynomial clustering model likelihood corpus distribution (LCD), which is based on a Bayesian numerical prototype. Experiments are done to prove the efficiency of LCD over various topic modelling algorithms, such as latent Dirichlet allocation (LDA), LDA with bag-of-words (LDA-BOW), latent semantic indexing (LSI), and hierarchical Dirichlet process (HDP). For performance evaluation, we considered five real-world datasets of Bengali corpora, such as science, sports, computer, season, and epic in this work. The coherence score of different modelling algorithms is compared to find the best model for each dataset separately. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 02562499
- Volume :
- 49
- Issue :
- 3
- Database :
- Academic Search Index
- Journal :
- Sādhanā: Academy Proceedings in Engineering Sciences
- Publication Type :
- Academic Journal
- Accession number :
- 178527531
- Full Text :
- https://doi.org/10.1007/s12046-024-02470-7