Modeling multi-prototype Chinese word representation learning for word similarity

Authors :: Fulian Yin
Jianbo Liu
Yanyan Wang
Marco Tosato
Source :: Complex & Intelligent Systems. 7:2977-2990
Publication Year :: 2021
Publisher :: Springer Science and Business Media LLC, 2021.
Abstract: The word similarity task is used to calculate the similarity of any pair of words, and is a basic technology of natural language processing (NLP). The existing method is based on word embedding, which fails to capture polysemy and is greatly influenced by the quality of the corpus. In this paper, we propose a multi-prototype Chinese word representation model (MP-CWR) for word similarity based on synonym knowledge base, including knowledge representation module and word similarity module. For the first module, we propose a dual attention to combine semantic information for jointly learning word knowledge representation. The MP-CWR model utilizes the synonyms as prior knowledge to supplement the relationship between words, which is helpful to solve the challenge of semantic expression due to insufficient data. As for the word similarity module, we propose a multi-prototype representation for each word. Then we calculate and fuse the conceptual similarity of two words to obtain the final result. Finally, we verify the effectiveness of our model on three public data sets with other baseline models. In addition, the experiments also prove the stability and scalability of our MP-CWR model under different corpora.

Subjects :: Word embedding
Knowledge representation and reasoning
Computer science
business.industry
Stability (learning theory)
computer.software_genre
Computational Mathematics
Knowledge base
Artificial Intelligence
Similarity (psychology)
Artificial intelligence
Polysemy
business
Engineering (miscellaneous)
computer
Feature learning
Word (computer architecture)
Natural language processing
Information Systems

Tools