1. Utilizing background corpus and dictionary to calculate similarity between unknown words
- Author
-
Hongge Hu, Xianlin Chen, and Xinghua Fan
- Subjects
Computer science ,business.industry ,Context (language use) ,Pattern recognition ,computer.software_genre ,Semantics ,Expression (mathematics) ,Similarity (network science) ,Connotation (semiotics) ,Segmentation ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing ,Meaning (linguistics) - Abstract
This paper presents a method of utilizing background corpus and dictionary to calculate similarity between unknown words. In the method, the best concept expression of unknown word in corpus was obtained from the background of it, then constructed context for the best concept expression. The connotation meaning of unknown word was determined by the difference between the context of the best concept expression and its own context. The similarity between unknown words was calculated by utilizing semantic dictionary. This method avoids the problems of mistaken segmentation and abused segmentation, which exist in the traditional method of calculating similarity between unknown words, which is based on segmentation strategy. Experimental results show that the method proposed in this paper is high effective.
- Published
- 2010