Back to Search
Start Over
Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations
- Source :
- JMIR Medical Informatics, Vol 8, Iss 9, p e19848 (2020), JMIR Medical Informatics
- Publication Year :
- 2020
- Publisher :
- JMIR Publications, 2020.
-
Abstract
- Background Clinical named entity recognition (CNER), whose goal is to automatically identify clinical entities in electronic medical records (EMRs), is an important research direction of clinical text data mining and information extraction. The promotion of CNER can provide support for clinical decision making and medical knowledge base construction, which could then improve overall medical quality. Compared with English CNER, and due to the complexity of Chinese word segmentation and grammar, Chinese CNER was implemented later and is more challenging. Objective With the development of distributed representation and deep learning, a series of models have been applied in Chinese CNER. Different from the English version, Chinese CNER is mainly divided into character-based and word-based methods that cannot make comprehensive use of EMR information and cannot solve the problem of ambiguity in word representation. Methods In this paper, we propose a lattice long short-term memory (LSTM) model combined with a variant contextualized character representation and a conditional random field (CRF) layer for Chinese CNER: the Embeddings from Language Models (ELMo)-lattice-LSTM-CRF model. The lattice LSTM model can effectively utilize the information from characters and words in Chinese EMRs; in addition, the variant ELMo model uses Chinese characters as input instead of the character-encoding layer of the ELMo model, so as to learn domain-specific contextualized character embeddings. Results We evaluated our method using two Chinese CNER datasets from the China Conference on Knowledge Graph and Semantic Computing (CCKS): the CCKS-2017 CNER dataset and the CCKS-2019 CNER dataset. We obtained F1 scores of 90.13% and 85.02% on the test sets of these two datasets, respectively. Conclusions Our results show that our proposed method is effective in Chinese CNER. In addition, the results of our experiments show that variant contextualized character representations can significantly improve the performance of the model.
- Subjects :
- ELMo
Conditional random field
neural network
Computer science
media_common.quotation_subject
Computer applications to medicine. Medical informatics
R858-859.7
Health Informatics
computer.software_genre
sequence tagging
03 medical and health sciences
0302 clinical medicine
Health Information Management
Named-entity recognition
Semantic computing
030212 general & internal medicine
030304 developmental biology
media_common
Original Paper
0303 health sciences
Grammar
business.industry
Deep learning
deep learning
lattice LSTM
Information extraction
clinical named entity recognition
Artificial intelligence
Language model
Chinese characters
business
computer
Natural language processing
Subjects
Details
- Language :
- English
- ISSN :
- 22919694
- Volume :
- 8
- Issue :
- 9
- Database :
- OpenAIRE
- Journal :
- JMIR Medical Informatics
- Accession number :
- edsair.doi.dedup.....74cac434d489b3977ae67bdc5c2c9f8f