Back to Search Start Over

Evaluation and Analysis of Large Language Models for Clinical Text Augmentation and Generation

Authors :
Atif Latif
Jihie Kim
Source :
IEEE Access, Vol 12, Pp 48987-48996 (2024)
Publication Year :
2024
Publisher :
IEEE, 2024.

Abstract

A major challenge in deep learning (DL) model training is data scarcity. Data scarcity is commonly found in specific domains, such as clinical or low-resource languages, that are not vastly explored in AI research. In this paper, we investigate the generation capability of large language models such as Text-To-Text Transfer Transformer (T5) and Bidirectional and Auto-Regressive Transformers (BART) for Clinical Health-Aware Reasoning across Dimensions (CHARDAT) dataset by applying the ChatGPT augmentation technique. We employed ChatGPT to rephrase each instance of the training set into conceptually similar but semantically different samples and augmented them to the dataset. This study aims to investigate the utilization of large language models, ChatGPT in particular, for data augmentation to overcome the limited availability in the clinical domain. In addition to the ChatGPT augmentation, we applied other augmentation techniques, such as easy data augmentation (EDA) and an easier data augmentation (AEDA), to clinical data. ChatGPT comprehended the contextual significance of sentences within the dataset and successfully modified English terms but not clinical terms. The original CHARDAT datasets represent 52 health conditions across three clinical dimensions, i.e., Treatments, Risk Factors, and Preventions. We compared the outputs for different augmentation techniques and evaluated their relative performance. Additionally, we examined how these techniques perform with different pre-trained language models, assessing their sensitivity in various contexts. Despite the relatively small size of the CHARDAT dataset, our results demonstrated that augmentation methods like ChatGPT augmentation surpassed the efficiency of the previously employed back-translation augmentation. Specifically, our findings revealed that the BART model resulted in superior performance, achieving a rouge score of 52.35 for ROUGE-1, 41.59 for ROUGE-2, and 50.71 for ROUGE-L.

Details

Language :
English
ISSN :
21693536
Volume :
12
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.5f0446a4b9e942709266a93290953252
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2024.3384496