Start Over

An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study

Authors :: Lei Wang
Yinyao Ma
Wenshuai Bi
Hanlin Lv
Yuxiang Li
Source :: Journal of Medical Internet Research, Vol 26, p e54580 (2024)
Publication Year :: 2024
Publisher :: JMIR Publications, 2024.
Abstract: BackgroundThe study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic-based extraction pipelines are gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention. ObjectiveThis study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records. MethodsThe pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question-and-answer scale extraction, which was tested via 2 low-parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People’s Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert’s annotation. The pipeline was evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer-grade GPU. ResultsThe pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio. ConclusionsThe pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer-grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records.

Subjects :: Computer applications to medicine. Medical informatics
R858-859.7
Public aspects of medicine
RA1-1270

Details

Language :: English
ISSN :: 14388871
Volume :: 26
Database :: Directory of Open Access Journals
Journal :: Journal of Medical Internet Research
Publication Type :: Academic Journal
Accession number :: edsdoj.b807e1d20b37476ca08e96c9cdf5b57b
Document Type :: article
Full Text :: https://doi.org/10.2196/54580

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources