Start Over

VLFATRollout: Fully transformer-based classifier for retinal OCT volumes.

Authors :: Oghbaie M
Araújo T
Schmidt-Erfurth U
Bogunović H
Source :: Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society [Comput Med Imaging Graph] 2024 Dec; Vol. 118, pp. 102452. Date of Electronic Publication: 2024 Oct 29.
Publication Year :: 2024
Abstract: Background and Objective: Despite the promising capabilities of 3D transformer architectures in video analysis, their application to high-resolution 3D medical volumes encounters several challenges. One major limitation is the high number of 3D patches, which reduces the efficiency of the global self-attention mechanisms of transformers. Additionally, background information can distract vision transformers from focusing on crucial areas of the input image, thereby introducing noise into the final representation. Moreover, the variability in the number of slices per volume complicates the development of models capable of processing input volumes of any resolution while simple solutions like subsampling may risk losing essential diagnostic details.<br />Methods: To address these challenges, we introduce an end-to-end transformer-based framework, variable length feature aggregator transformer rollout (VLFATRollout), to classify volumetric data. The proposed VLFATRollout enjoys several merits. First, the proposed VLFATRollout can effectively mine slice-level fore-background information with the help of transformer's attention matrices. Second, randomization of volume-wise resolution (i.e. the number of slices) during training enhances the learning capacity of the learnable positional embedding (PE) assigned to each volume slice. This technique allows the PEs to generalize across neighboring slices, facilitating the handling of high-resolution volumes at the test time.<br />Results: VLFATRollout was thoroughly tested on the retinal optical coherence tomography (OCT) volume classification task, demonstrating a notable average improvement of 5.47% in balanced accuracy over the leading convolutional models for a 5-class diagnostic task. These results emphasize the effectiveness of our framework in enhancing slice-level representation and its adaptability across different volume resolutions, paving the way for advanced transformer applications in medical image analysis. The code is available at https://github.com/marziehoghbaie/VLFATRollout/.<br />Competing Interests: Declaration of competing interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.<br /> (Copyright © 2024 The Authors. Published by Elsevier Ltd.. All rights reserved.)

Subjects :: Humans
Image Interpretation, Computer-Assisted methods
Tomography, Optical Coherence methods
Retina diagnostic imaging
Imaging, Three-Dimensional methods
Algorithms

Details

Language :: English
ISSN :: 1879-0771
Volume :: 118
Database :: MEDLINE
Journal :: Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society
Publication Type :: Academic Journal
Accession number :: 39489098
Full Text :: https://doi.org/10.1016/j.compmedimag.2024.102452