Back to Search Start Over

Zero- and few-shot prompting of generative large language models provides weak assessment of risk of bias in clinical trials.

Authors :
Šuster S
Baldwin T
Verspoor K
Source :
Research synthesis methods [Res Synth Methods] 2024 Nov; Vol. 15 (6), pp. 988-1000. Date of Electronic Publication: 2024 Aug 23.
Publication Year :
2024

Abstract

Existing systems for automating the assessment of risk-of-bias (RoB) in medical studies are supervised approaches that require substantial training data to work well. However, recent revisions to RoB guidelines have resulted in a scarcity of available training data. In this study, we investigate the effectiveness of generative large language models (LLMs) for assessing RoB. Their application requires little or no training data and, if successful, could serve as a valuable tool to assist human experts during the construction of systematic reviews. Following Cochrane's latest guidelines (RoB2) designed for human reviewers, we prepare instructions that are fed as input to LLMs, which then infer the risk associated with a trial publication. We distinguish between two modelling tasks: directly predicting RoB2 from text; and employing decomposition, in which a RoB2 decision is made after the LLM responds to a series of signalling questions. We curate new testing data sets and evaluate the performance of four general- and medical-domain LLMs. The results fall short of expectations, with LLMs seldom surpassing trivial baselines. On the direct RoB2 prediction test set (n = 5993), LLMs perform akin to the baselines (F1: 0.1-0.2). In the decomposition task setup (n = 28,150), similar F1 scores are observed. Our additional comparative evaluation on RoB1 data also reveals results substantially below those of a supervised system. This testifies to the difficulty of solving this task based on (complex) instructions alone. Using LLMs as an assisting technology for assessing RoB2 thus currently seems beyond their reach.<br /> (© 2024 The Author(s). Research Synthesis Methods published by John Wiley & Sons Ltd.)

Details

Language :
English
ISSN :
1759-2887
Volume :
15
Issue :
6
Database :
MEDLINE
Journal :
Research synthesis methods
Publication Type :
Academic Journal
Accession number :
39176994
Full Text :
https://doi.org/10.1002/jrsm.1749