Back to Search
Start Over
Approximating Word Ranking and Negative Sampling for Word Embedding
- Source :
- IJCAI
- Publication Year :
- 2018
- Publisher :
- International Joint Conferences on Artificial Intelligence Organization, 2018.
-
Abstract
- CBOW (Continuous Bag-Of-Words) is one of the most commonly used techniques to generate word embeddings in various NLP tasks. However, it fails to reach the optimal performance due to uniform involvements of positive words and a simple sampling distribution of negative words. To resolve these issues, we propose OptRank to optimize word ranking and approximate negative sampling for bettering word embedding. Specifically, we first formalize word embedding as a ranking problem. Then, we weigh the positive words by their ranks such that highly ranked words have more importance, and adopt a dynamic sampling strategy to select informative negative words. In addition, an approximation method is designed to efficiently compute word ranks. Empirical experiments show that OptRank consistently outperforms its counterparts on a benchmark dataset with different sampling scales, especially when the sampled subset is small. The code and datasets can be obtained from https://github.com/ouououououou/OptRank.
- Subjects :
- Word embedding
Computer science
business.industry
Sampling (statistics)
02 engineering and technology
010501 environmental sciences
computer.software_genre
01 natural sciences
Ranking (information retrieval)
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Artificial intelligence
business
computer
Natural language processing
Word (computer architecture)
0105 earth and related environmental sciences
Subjects
Details
- Language :
- English
- ISBN :
- 978-0-9992411-2-7
- ISBNs :
- 9780999241127
- Database :
- OpenAIRE
- Journal :
- IJCAI
- Accession number :
- edsair.doi.dedup.....3c953c88c803c2124b25ef8c14322371