Back to Search Start Over

Learning bag-of-embedded-words representations for textual information retrieval

Authors :
Nikolaos Passalis
Anastasios Tefas
Source :
Pattern Recognition. 81:254-267
Publication Year :
2018
Publisher :
Elsevier BV, 2018.

Abstract

Word embedding models are able to accurately model the semantic content of words. The process of extracting a set of word embedding vectors from a text document is similar to the feature extraction step of the Bag-of-Features (BoF) model, which is usually used in computer vision tasks. This gives rise to the proposed Bag-of-Embedded Words (BoEW) model that can efficiently represent text documents overcoming the limitations of previously predominantly used techniques, such as the textual Bag-of-Words model. The proposed method extends the regular BoF model by a) incorporating a weighting mask that allows for altering the importance of each learned codeword and b) by optimizing the model end-to-end (from the word embeddings to the weighting mask). Furthermore, the BoEW model also provides a fast way to fine-tune the learned representation towards the information need of the user using relevance feedback techniques. Finally, a novel spherical entropy objective function is proposed to optimize the learned representation for retrieval using the cosine similarity metric.

Details

ISSN :
00313203
Volume :
81
Database :
OpenAIRE
Journal :
Pattern Recognition
Accession number :
edsair.doi...........aad1e7159d11d5b680170870e1b6ba9a