Back to Search Start Over

Assessing generalizability of an AI-based visual test for cervical cancer screening.

Authors :
Syed Rakin Ahmed
Didem Egemen
Brian Befano
Ana Cecilia Rodriguez
Jose Jeronimo
Kanan Desai
Carolina Teran
Karla Alfaro
Joel Fokom-Domgue
Kittipat Charoenkwan
Chemtai Mungo
Rebecca Luckett
Rakiya Saidu
Taina Raiol
Ana Ribeiro
Julia C Gage
Silvia de Sanjose
Jayashree Kalpathy-Cramer
Mark Schiffman
Source :
PLOS Digital Health, Vol 3, Iss 10, p e0000364 (2024)
Publication Year :
2024
Publisher :
Public Library of Science (PLoS), 2024.

Abstract

A number of challenges hinder artificial intelligence (AI) models from effective clinical translation. Foremost among these challenges is the lack of generalizability, which is defined as the ability of a model to perform well on datasets that have different characteristics from the training data. We recently investigated the development of an AI pipeline on digital images of the cervix, utilizing a multi-heterogeneous dataset of 9,462 women (17,013 images) and a multi-stage model selection and optimization approach, to generate a diagnostic classifier able to classify images of the cervix into "normal", "indeterminate" and "precancer/cancer" (denoted as "precancer+") categories. In this work, we investigate the performance of this multiclass classifier on external data not utilized in training and internal validation, to assess the generalizability of the classifier when moving to new settings. We assessed both the classification performance and repeatability of our classifier model across the two axes of heterogeneity present in our dataset: image capture device and geography, utilizing both out-of-the-box inference and retraining with external data. Our results demonstrate that device-level heterogeneity affects our model performance more than geography-level heterogeneity. Classification performance of our model is strong on images from a new geography without retraining, while incremental retraining with inclusion of images from a new device progressively improves classification performance on that device up to a point of saturation. Repeatability of our model is relatively unaffected by data heterogeneity and remains strong throughout. Our work supports the need for optimized retraining approaches that address data heterogeneity (e.g., when moving to a new device) to facilitate effective use of AI models in new settings.

Details

Language :
English
ISSN :
27673170
Volume :
3
Issue :
10
Database :
Directory of Open Access Journals
Journal :
PLOS Digital Health
Publication Type :
Academic Journal
Accession number :
edsdoj.4826030e45540a2b3e411587d53f3fe
Document Type :
article
Full Text :
https://doi.org/10.1371/journal.pdig.0000364