1. Morpheme-Based Chemical Term Analysis and Recognition
- Author
-
Xia Bo, Xun Endong, Qian Qingqing, Rao Gaoqi, Xiao Ye, and Wang Guirong
- Subjects
Computer science ,business.industry ,Word formation ,computer.software_genre ,Terminology ,Term (time) ,Domain (software engineering) ,Morpheme ,Feature (machine learning) ,Forward algorithm ,Artificial intelligence ,Hidden Markov model ,business ,computer ,Natural language processing - Abstract
Term recognition is the foundation of natural language processing, especially in chemical domain. The most important feature of domain text is terminology. Because of the variety and the complexity of chemical terms, chemical term NER is too difficult to be used in a production environment such as patents search. Based on the theory of morphemes and data of Chinese patents, we construct a chemical morpheme classification according to the composition characteristics and formation rules of chemical terms. A simple and stable HMM (The Hidden Markov Model) model is used to model the word formation relationship among different morphemes. The improved forward algorithm is applied to estimate the word formation probability in terminology recognition. It solves the problem of poor recognition of the long terms in low frequency in traditional statistical methods. It also solves the problem of subjectivity and completeness in establishing rules to some extent. Finally, the F1 reaches 91.58% in chemical patents.
- Published
- 2021