Back to Search
Start Over
Serialized Co-Training-Based Recognition of Medicine Names for Patent Mining and Retrieval
- Source :
- International Journal of Data Warehousing and Mining. 16:87-107
- Publication Year :
- 2020
- Publisher :
- IGI Global, 2020.
-
Abstract
- In the retrieval and mining of traditional Chinese medicine (TCM) patents, a key step is Chinese word segmentation and named entity recognition. However, the alias phenomenon of traditional Chinese medicines causes great challenges to Chinese word segmentation and named entity recognition in TCM patents, which directly affects the effect of patent mining. Because of the lack of a comprehensive Chinese herbal medicine name thesaurus, traditional thesaurus-based Chinese word segmentation and named entity recognition are not suitable for medicine identification in TCM patents. In view of the present situation, using the language characteristics and structural characteristics of TCM patent texts, a modified and serialized co-training method to recognize medicine names from TCM patent abstract texts is proposed. Experiments show that this method can maintain high accuracy under relatively low time complexity. In addition, this method can also be expanded to the recognition of other named entities in TCM patents, such as disease names, preparation methods, and so on.
- Subjects :
- Thesaurus (information retrieval)
Co-training
Alias
Computer science
business.industry
02 engineering and technology
Traditional Chinese medicine
computer.software_genre
Identification (information)
Named-entity recognition
Hardware and Architecture
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Key (cryptography)
020201 artificial intelligence & image processing
Segmentation
Artificial intelligence
business
computer
Software
Natural language processing
Subjects
Details
- ISSN :
- 15483932 and 15483924
- Volume :
- 16
- Database :
- OpenAIRE
- Journal :
- International Journal of Data Warehousing and Mining
- Accession number :
- edsair.doi...........e74843584624ed442d52b220f5bdc78e
- Full Text :
- https://doi.org/10.4018/ijdwm.2020070105