Back to Search
Start Over
Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records
- Source :
- Scientific Reports, Vol 9, Iss 1, Pp 1-9 (2019), Scientific Reports
- Publication Year :
- 2019
- Publisher :
- Nature Publishing Group, 2019.
-
Abstract
- Electronic health records (EHR) represent a rich resource for conducting observational studies, supporting clinical trials, and more. However, much of the data contains unstructured text, presenting an obstacle to automated extraction. Natural language processing (NLP) can structure and learn from text, but NLP algorithms were not designed for the unique characteristics of EHR. Here, we propose Relevant Word Order Vectorization (RWOV) to aid with structuring. RWOV is based on finding the positional relationship between the most relevant words to predicting the class of a text. This facilitates machine learning algorithms to use the interaction of not just keywords but positional dependencies (e.g. a relevant word occurs 5 relevant words before some term of interest). As a proof-of-concept, we attempted to classify the hormone receptor status of breast cancer patients treated at the University of Kansas Medical Center, comparing RWOV to other methods using the F1 score and AUC. RWOV performed as well as, or better than other methods in all but one case. For F1 score, RWOV had a clear edge on most tasks. AUC tended to be closer, but for HER2, RWOV was significantly better for most comparisons. These results suggest RWOV should be further developed for EHR-related NLP.
- Subjects :
- 0301 basic medicine
Receptor, ErbB-2
Computer science
Datasets as Topic
lcsh:Medicine
Breast Neoplasms
computer.software_genre
Article
Machine Learning
03 medical and health sciences
Breast cancer
0302 clinical medicine
Resource (project management)
Electronic Health Records
Humans
Image tracing
lcsh:Science
Data mining
Natural Language Processing
Structure (mathematical logic)
Class (computer programming)
Multidisciplinary
business.industry
lcsh:R
Term (time)
030104 developmental biology
Receptors, Estrogen
Female
lcsh:Q
Artificial intelligence
Receptors, Progesterone
F1 score
business
computer
Algorithms
030217 neurology & neurosurgery
Word (computer architecture)
Natural language processing
Word order
Subjects
Details
- Language :
- English
- ISSN :
- 20452322
- Volume :
- 9
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- Scientific Reports
- Accession number :
- edsair.doi.dedup.....039308dba28cb016c1f9a4c9e167dfa2