1. Can Large Language Models Aid Caregivers of Pediatric Cancer Patients in Information Seeking? A Cross‐Sectional Investigation
- Author
-
Emre Sezgin, Daniel I. Jackson, A. Baki Kocaballi, Mindy Bibart, Sue Zupanec, Wendy Landier, Anthony Audino, Mark Ranalli, and Micah Skeens
- Subjects
artificial intelligence ,health care communication ,health literacy ,large language models ,patient education ,pediatric oncology ,Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,RC254-282 - Abstract
ABSTRACT Purpose Caregivers in pediatric oncology need accurate and understandable information about their child's condition, treatment, and side effects. This study assesses the performance of publicly accessible large language model (LLM)‐supported tools in providing valuable and reliable information to caregivers of children with cancer. Methods In this cross‐sectional study, we evaluated the performance of the four LLM‐supported tools—ChatGPT (GPT‐4), Google Bard (Gemini Pro), Microsoft Bing Chat, and Google SGE—against a set of frequently asked questions (FAQs) derived from the Children's Oncology Group Family Handbook and expert input (In total, 26 FAQs and 104 generated responses). Five pediatric oncology experts assessed the generated LLM responses using measures including accuracy, clarity, inclusivity, completeness, clinical utility, and overall rating. Additionally, the content quality was evaluated including readability, AI disclosure, source credibility, resource matching, and content originality. We used descriptive analysis and statistical tests including Shapiro–Wilk, Levene's, Kruskal–Wallis H‐tests, and Dunn's post hoc tests for pairwise comparisons. Results ChatGPT shows high overall performance when evaluated by the experts. Bard also performed well, especially in accuracy and clarity of the responses, whereas Bing Chat and Google SGE had lower overall scores. Regarding the disclosure of responses being generated by AI, it was observed less frequently in ChatGPT responses, which may have affected the clarity of responses, whereas Bard maintained a balance between AI disclosure and response clarity. Google SGE generated the most readable responses whereas ChatGPT answered with the most complexity. LLM tools varied significantly (p
- Published
- 2025
- Full Text
- View/download PDF