Start Over

Are large language models valid tools for patient information on lumbar disc herniation? The spine surgeons' perspective

Authors :: Siegmund Lang
Jacopo Vitale
Tamás F. Fekete
Daniel Haschtmann
Raluca Reitmeir
Mario Ropelato
Jani Puhakka
Fabio Galbusera
Markus Loibl
Source :: Brain and Spine, Vol 4, Iss , Pp 102804- (2024)
Publication Year :: 2024
Publisher :: Elsevier, 2024.
Abstract: Introduction: Generative AI is revolutionizing patient education in healthcare, particularly through chatbots that offer personalized, clear medical information. Reliability and accuracy are vital in AI-driven patient education. Research question: How effective are Large Language Models (LLM), such as ChatGPT and Google Bard, in delivering accurate and understandable patient education on lumbar disc herniation? Material and methods: Ten Frequently Asked Questions about lumbar disc herniation were selected from 133 questions and were submitted to three LLMs. Six experienced spine surgeons rated the responses on a scale from “excellent” to “unsatisfactory,” and evaluated the answers for exhaustiveness, clarity, empathy, and length. Statistical analysis involved Fleiss Kappa, Chi-square, and Friedman tests. Results: Out of the responses, 27.2% were excellent, 43.9% satisfactory with minimal clarification, 18.3% satisfactory with moderate clarification, and 10.6% unsatisfactory. There were no significant differences in overall ratings among the LLMs (p = 0.90); however, inter-rater reliability was not achieved, and large differences among raters were detected in the distribution of answer frequencies. Overall, ratings varied among the 10 answers (p = 0.043). The average ratings for exhaustiveness, clarity, empathy, and length were above 3.5/5. Discussion and conclusion: LLMs show potential in patient education for lumbar spine surgery, with generally positive feedback from evaluators. The new EU AI Act, enforcing strict regulation on AI systems, highlights the need for rigorous oversight in medical contexts. In the current study, the variability in evaluations and occasional inaccuracies underline the need for continuous improvement. Future research should involve more advanced models to enhance patient-physician communication.

Subjects :: Lumbar disc herniation
Patient education
Large language models
ChatGPT
Google bard
AI evaluation
Neurology. Diseases of the nervous system
RC346-429

Details

Language :: English
ISSN :: 27725294
Volume :: 4
Issue :: 102804-
Database :: Directory of Open Access Journals
Journal :: Brain and Spine
Publication Type :: Academic Journal
Accession number :: edsdoj.bb486bf807d6429cb4e0f678e37ad945
Document Type :: article
Full Text :: https://doi.org/10.1016/j.bas.2024.102804

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Are large language models valid tools for patient information on lumbar disc herniation? The spine surgeons' perspective

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Are large language models valid tools for patient information on lumbar disc herniation? The spine surgeons' perspective

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources