Back to Search Start Over

Serialized Co-Training-Based Recognition of Medicine Names for Patent Mining and Retrieval

Authors :
Caiquan Xiong
Na Deng
Source :
International Journal of Data Warehousing and Mining. 16:87-107
Publication Year :
2020
Publisher :
IGI Global, 2020.

Abstract

In the retrieval and mining of traditional Chinese medicine (TCM) patents, a key step is Chinese word segmentation and named entity recognition. However, the alias phenomenon of traditional Chinese medicines causes great challenges to Chinese word segmentation and named entity recognition in TCM patents, which directly affects the effect of patent mining. Because of the lack of a comprehensive Chinese herbal medicine name thesaurus, traditional thesaurus-based Chinese word segmentation and named entity recognition are not suitable for medicine identification in TCM patents. In view of the present situation, using the language characteristics and structural characteristics of TCM patent texts, a modified and serialized co-training method to recognize medicine names from TCM patent abstract texts is proposed. Experiments show that this method can maintain high accuracy under relatively low time complexity. In addition, this method can also be expanded to the recognition of other named entities in TCM patents, such as disease names, preparation methods, and so on.

Details

ISSN :
15483932 and 15483924
Volume :
16
Database :
OpenAIRE
Journal :
International Journal of Data Warehousing and Mining
Accession number :
edsair.doi...........e74843584624ed442d52b220f5bdc78e
Full Text :
https://doi.org/10.4018/ijdwm.2020070105