1. Resolve out of Vocabulary with Long Short-Term Memory Networks for Morphology
- Author
-
Caixin Zhu, Chuanxiang Tang, and Yun Tang
- Subjects
0209 industrial biotechnology ,Vocabulary ,Word embedding ,Morphology (linguistics) ,Machine translation ,Computer science ,business.industry ,media_common.quotation_subject ,Sentiment analysis ,Word processing ,02 engineering and technology ,computer.software_genre ,020901 industrial engineering & automation ,Text processing ,Morpheme ,0202 electrical engineering, electronic engineering, information engineering ,Question answering ,020201 artificial intelligence & image processing ,Word2vec ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing ,media_common - Abstract
Out of vocabulary (OOV), which is a word that does not exist in a predefined vocabulary. How to deal with OOV is an important research topic in the field of natural language processing. The existence of OOV directly affects the performance of many NLP systems. For example, in some common scenarios such as machine translation, sentiment analysis, and intelligent question answering, the existence of OOV can greatly affect the key performance of the system. In recent years, with the advent of the word vector algorithm word2vec based on the principle of word morphology, the word embedding path of the NLP system has improved significantly. We combine LSTM with NLM, taking the morphemes of words as the basic processing unit, while taking into account the global context information. The results obtained are better than the existing OOV processing strategies, and the performance of commonly used NLP systems is generally improved. Finally, it is experimentally proved that our model is generally better than the existing models in the problem of unregistered word processing.
- Published
- 2020