Prashant D. Tailor, MD, Timothy T. Xu, MD, Blake H. Fortes, MD, Raymond Iezzi, MD, Timothy W. Olsen, MD, Matthew R. Starr, MD, Sophie J. Bakri, MD, Brittni A. Scruggs, MD, PhD, Andrew J. Barkmeier, MD, Sanjay V. Patel, MD, Keith H. Baratz, MD, Ashlie A. Bernhisel, MD, Lilly H. Wagner, MD, Andrea A. Tooley, MD, Gavin W. Roddy, MD, PhD, Arthur J. Sit, MD, Kristi Y. Wu, MD, Erick D. Bothun, MD, Sasha A. Mansukhani, MBBS, Brian G. Mohney, MD, John J. Chen, MD, PhD, Michael C. Brodsky, MD, Deena A. Tajfirouz, MD, Kevin D. Chodnicki, MD, Wendy M. Smith, MD, and Lauren A. Dalvin, MD
Objective: To determine the appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model to ophthalmology questions. Patients and Methods: Cross-sectional qualitative study from April 1, 2023, to April 30, 2023. A total of 192 questions were generated spanning all ophthalmic subspecialties. Each question was posed to a large language model (LLM) 3 times. The responses were graded by appropriate subspecialists as appropriate, inappropriate, or unreliable in 2 grading contexts. The first grading context was if the information was presented on a patient information site. The second was an LLM-generated draft response to patient queries sent by the electronic medical record (EMR). Appropriate was defined as accurate and specific enough to serve as a surrogate for physician-approved information. Main outcome measure was percentage of appropriate responses per subspecialty. Results: For patient information site-related questions, the LLM provided an overall average of 79% appropriate responses. Variable rates of average appropriateness were observed across ophthalmic subspecialties for patient information site information ranging from 56% to 100%: cataract or refractive (92%), cornea (56%), glaucoma (72%), neuro-ophthalmology (67%), oculoplastic or orbital surgery (80%), ocular oncology (100%), pediatrics (89%), vitreoretinal diseases (86%), and uveitis (65%). For draft responses to patient questions via EMR, the LLM provided an overall average of 74% appropriate responses and varied by subspecialty: cataract or refractive (85%), cornea (54%), glaucoma (77%), neuro-ophthalmology (63%), oculoplastic or orbital surgery (62%), ocular oncology (90%), pediatrics (94%), vitreoretinal diseases (88%), and uveitis (55%). Stratifying grades across health information categories (disease and condition, risk and prevention, surgery-related, and treatment and management) showed notable but insignificant variations, with disease and condition often rated highest (72% and 69%) for appropriateness and surgery-related (55% and 51%) lowest, in both contexts. Conclusion: This LLM reported mostly appropriate responses across multiple ophthalmology subspecialties in the context of both patient information sites and EMR-related responses to patient questions. Current LLM offerings require optimization and improvement before widespread clinical use.