Back to Search
Start Over
Learning bag-of-embedded-words representations for textual information retrieval
- Source :
- Pattern Recognition. 81:254-267
- Publication Year :
- 2018
- Publisher :
- Elsevier BV, 2018.
-
Abstract
- Word embedding models are able to accurately model the semantic content of words. The process of extracting a set of word embedding vectors from a text document is similar to the feature extraction step of the Bag-of-Features (BoF) model, which is usually used in computer vision tasks. This gives rise to the proposed Bag-of-Embedded Words (BoEW) model that can efficiently represent text documents overcoming the limitations of previously predominantly used techniques, such as the textual Bag-of-Words model. The proposed method extends the regular BoF model by a) incorporating a weighting mask that allows for altering the importance of each learned codeword and b) by optimizing the model end-to-end (from the word embeddings to the weighting mask). Furthermore, the BoEW model also provides a fast way to fine-tune the learned representation towards the information need of the user using relevance feedback techniques. Finally, a novel spherical entropy objective function is proposed to optimize the learned representation for retrieval using the cosine similarity metric.
- Subjects :
- Word embedding
Computer science
business.industry
Feature extraction
Cosine similarity
Relevance feedback
02 engineering and technology
computer.software_genre
Weighting
Artificial Intelligence
Bag-of-words model
020204 information systems
Signal Processing
0202 electrical engineering, electronic engineering, information engineering
Entropy (information theory)
020201 artificial intelligence & image processing
Computer Vision and Pattern Recognition
Artificial intelligence
business
computer
Software
Natural language processing
Subjects
Details
- ISSN :
- 00313203
- Volume :
- 81
- Database :
- OpenAIRE
- Journal :
- Pattern Recognition
- Accession number :
- edsair.doi...........aad1e7159d11d5b680170870e1b6ba9a