融合词向量特征的双词主题模型.

Authors :: 刘良选
 黄梦醒
Source :: Application Research of Computers / Jisuanji Yingyong Yanjiu. Jul2017, Vol. 34 Issue 7, p2055-2058. 4p.
Publication Year :: 2017
Abstract: To solve the problem of content sparsity and lack of context information existed inherently in short texts, this paper proposed a biterm topic model (BTM) incorporating word vector features LF-BTM based on BTM. This model introuded latent feature model which utilized its abunnant word vector information to offset the data sparsity. Generation of words in each biterm was influenced jointly by topic-word multinomial distribution and latent features model in the improved generative process. Parameters in the model could be learned by of Gibbs sampling method. Experimental results on real-world short texts datasets demonstrate that the model can integrate word vectors trained from external general large-scale corpora to produce significant improvements on topic coherence. [ABSTRACT FROM AUTHOR]