Start Over

Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer 177Lu-PSMA-617 therapy.

Authors :: Bilgin, Gokce Belge
Bilgin, Cem
Childs, Daniel S.
Orme, Jacob J.
Burkett, Brian J.
Packard, Ann T.
Johnson, Derek R.
Thorpe, Matthew P.
Bin Riaz, Irbaz
Halfdanarson, Thorvardur R.
Johnson, Geoffrey B.
Sartor, Oliver
Kendi, Ayse Tuba
Source :: Frontiers in Oncology; 2024, p1-6, 6p
Publication Year :: 2024
Abstract: Background: Many patients use artificial intelligence (AI) chatbots as a rapid source of health information. This raises important questions about the reliability and effectiveness of AI chatbots in delivering accurate and understandable information. Purpose: To evaluate and compare the accuracy, conciseness, and readability of responses from OpenAI ChatGPT-4 and Google Bard to patient inquiries concerning the novel <superscript>177</superscript>Lu-PSMA-617 therapy for prostate cancer. Materials and Methods: Two experts listed the 12 most commonly asked questions by patients on <superscript>177</superscript>Lu-PSMA-617 therapy. These twelve questions were prompted to OpenAI ChatGPT-4 and Google Bard. AI-generated responses were distributed using an online survey platform (Qualtrics) and blindly rated by eight experts. The performances of the AI chatbots were evaluated and compared across three domains: accuracy, conciseness, and readability. Additionally, potential safety concerns associated with AI-generated answers were also examined. The Mann-Whitney U and chi-square tests were utilized to compare the performances of AI chatbots. Results: Eight experts participated in the survey, evaluating 12 AI-generated responses across the three domains of accuracy, conciseness, and readability, resulting in 96 assessments (12 responses x 8 experts) for each domain per chatbot. ChatGPT-4 provided more accurate answers than Bard (2.95 ± 0.671 vs 2.73 ± 0.732, p=0.027). Bard's responses had better readability than ChatGPT-4 (2.79 ± 0.408 vs 2.94 ± 0.243, p=0.003). Both ChatGPT-4 and Bard achieved comparable conciseness scores (3.14 ± 0.659 vs 3.11 ± 0.679, p=0.798). Experts categorized the AI-generated responses as incorrect or partially correct at a rate of 16.6% for ChatGPT-4 and 29.1% for Bard. Bard's answers contained significantly more misleading information than those of ChatGPT-4 (p = 0.039). Conclusion: AI chatbots have gained significant attention, and their performance is continuously improving. Nonetheless, these technologies still need further improvements to be considered reliable and credible sources for patients seeking medical information on <superscript>177</superscript>Lu-PSMA-617 therapy. [ABSTRACT FROM AUTHOR]

Subjects :: GEMINI (Chatbot)
CHATGPT
CHATBOTS
ARTIFICIAL intelligence
MACHINE learning

Details

Language :: English
ISSN :: 2234943X
Database :: Complementary Index
Journal :: Frontiers in Oncology
Publication Type :: Academic Journal
Accession number :: 178706725
Full Text :: https://doi.org/10.3389/fonc.2024.1386718

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer 177Lu-PSMA-617 therapy.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer 177Lu-PSMA-617 therapy.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources