Start Over

Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic.

Authors :: Sallam, Malik
Al-Mahzoum, Kholoud
Alshuaib, Omaima
Alhajri, Hawajer
Alotaibi, Fatmah
Alkhurainej, Dalal
Al-Balwah, Mohammad Yahya
Barakat, Muna
Egger, Jan
Source :: BMC Infectious Diseases; 8/8/2024, Vol. 24 Issue 1, p1-13, 13p
Publication Year :: 2024
Abstract: Background: Assessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy of information in multilingual contexts. This study aimed to compare AI model efficiency in English and Arabic for infectious disease queries. Methods: The study employed the METRICS checklist for the design and reporting of AI-based studies in healthcare. The AI models tested included ChatGPT-3.5, ChatGPT-4, Bing, and Bard. The queries comprised 15 questions on HIV/AIDS, tuberculosis, malaria, COVID-19, and influenza. The AI-generated content was assessed by two bilingual experts using the validated CLEAR tool. Results: In comparing AI models' performance in English and Arabic for infectious disease queries, variability was noted. English queries showed consistently superior performance, with Bard leading, followed by Bing, ChatGPT-4, and ChatGPT-3.5 (P =.012). The same trend was observed in Arabic, albeit without statistical significance (P =.082). Stratified analysis revealed higher scores for English in most CLEAR components, notably in completeness, accuracy, appropriateness, and relevance, especially with ChatGPT-3.5 and Bard. Across the five infectious disease topics, English outperformed Arabic, except for flu queries in Bing and Bard. The four AI models' performance in English was rated as "excellent", significantly outperforming their "above-average" Arabic counterparts (P =.002). Conclusions: Disparity in AI model performance was noticed between English and Arabic in response to infectious disease queries. This language variation can negatively impact the quality of health content delivered by AI models among native speakers of Arabic. This issue is recommended to be addressed by AI developers, with the ultimate goal of enhancing health outcomes. [ABSTRACT FROM AUTHOR]

Subjects :: GENERATIVE artificial intelligence
ARTIFICIAL intelligence
VARIATION in language
CHATGPT
ENGLISH language

Details

Language :: English
ISSN :: 14712334
Volume :: 24
Issue :: 1
Database :: Complementary Index
Journal :: BMC Infectious Diseases
Publication Type :: Academic Journal
Accession number :: 178912792
Full Text :: https://doi.org/10.1186/s12879-024-09725-y

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources