A Study of Generative Large Language Model for Medical Research and Healthcare

Authors :: Peng, Cheng
Yang, Xi
Chen, Aokun
Smith, Kaleb E
PourNejatian, Nima
Costa, Anthony B
Martin, Cheryl
Flores, Mona G
Zhang, Ying
Magoc, Tanja
Lipori, Gloria
Mitchell, Duane A
Ospina, Naykky S
Ahmed, Mustafa M
Hogan, William R
Shenkman, Elizabeth A
Guo, Yi
Bian, Jiang
Wu, Yonghui
Publication Year :: 2023
Abstract: There is enormous enthusiasm and concerns in using large language models (LLMs) in healthcare, yet current assumptions are all based on general-purpose LLMs such as ChatGPT. This study develops a clinical generative LLM, GatorTronGPT, using 277 billion words of mixed clinical and English text with a GPT-3 architecture of 20 billion parameters. GatorTronGPT improves biomedical natural language processing for medical research. Synthetic NLP models trained using GatorTronGPT generated text outperform NLP models trained using real-world clinical text. Physicians Turing test using 1 (worst) to 9 (best) scale shows that there is no significant difference in linguistic readability (p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them (p < 0.001). This study provides insights on the opportunities and challenges of LLMs for medical research and healthcare.