1. OCTCube-M: A 3D multimodal optical coherence tomography foundation model for retinal and systemic diseases with cross-cohort and cross-device validation
- Author
-
Liu, Zixuan, Xu, Hanwen, Woicik, Addie, Shapiro, Linda G., Blazes, Marian, Wu, Yue, Steffen, Verena, Cukras, Catherine, Lee, Cecilia S., Zhang, Miao, Lee, Aaron Y., and Wang, Sheng
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
We present OCTCube-M, a 3D OCT-based multi-modal foundation model for jointly analyzing OCT and en face images. OCTCube-M first developed OCTCube, a 3D foundation model pre-trained on 26,685 3D OCT volumes encompassing 1.62 million 2D OCT images. It then exploits a novel multi-modal contrastive learning framework COEP to integrate other retinal imaging modalities, such as fundus autofluorescence and infrared retinal imaging, into OCTCube, efficiently extending it into multi-modal foundation models. OCTCube achieves best performance on predicting 8 retinal diseases, demonstrating strong generalizability on cross-cohort, cross-device and cross-modality prediction. OCTCube can also predict cross-organ nodule malignancy (CT) and low cardiac ejection fraction as well as systemic diseases, such as diabetes and hypertension, revealing its wide applicability beyond retinal diseases. We further develop OCTCube-IR using COEP with 26,685 OCT and IR image pairs. OCTCube-IR can accurately retrieve between OCT and IR images, allowing joint analysis between 3D and 2D retinal imaging modalities. Finally, we trained a tri-modal foundation model OCTCube-EF from 4 million 2D OCT images and 400K en face retinal images. OCTCube-EF attains the best performance on predicting the growth rate of geographic atrophy (GA) across datasets collected from 6 multi-center global trials conducted in 23 countries. This improvement is statistically equivalent to running a clinical trial with more than double the size of the original study. Our analysis based on another retrospective case study reveals OCTCube-EF's ability to avoid false positive Phase-III results according to its accurate treatment effect estimation on the Phase-II results. In sum, OCTCube-M is a 3D multi-modal foundation model framework that integrates OCT and other retinal imaging modalities revealing substantial diagnostic and prognostic benefits.
- Published
- 2024