Back to Search
Start Over
Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images.
Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images.
- Source :
- Visual Computing for Industry, Biomedicine & Art; 1/8/2025, Vol. 8 Issue 1, p1-27, 27p
- Publication Year :
- 2025
-
Abstract
- The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex architecture, which requires high-performance GPUs or CPUs for efficient model training and deployment in real-world medical diagnostic devices. This renders them more intricate than convolutional neural networks (CNNs). This difficulty is also challenging in the context of histopathology image analysis, where the images are both limited and complex. In response to these challenges, this study proposes a TokenMixer hybrid-architecture that combines the strengths of CNNs and ViTs. This hybrid architecture aims to enhance feature extraction and classification accuracy with shorter training time and fewer parameters by minimizing the number of input patches employed during training, while incorporating tokenization of input patches using convolutional layers and encoder transformer layers to process patches across all network layers for fast and accurate breast cancer tumor subtype classification. The TokenMixer mechanism is inspired by the ConvMixer and TokenLearner models. First, the ConvMixer model dynamically generates spatial attention maps using convolutional layers, enabling the extraction of patches from input images to minimize the number of input patches used in training. Second, the TokenLearner model extracts relevant regions from the selected input patches, tokenizes them to improve feature extraction, and trains all tokenized patches in an encoder transformer network. We evaluated the TokenMixer model on the BreakHis public dataset, comparing it with ViT-based and other state-of-the-art methods. Our approach achieved impressive results for both binary and multi-classification of breast cancer subtypes across various magnification levels (40×, 100×, 200×, 400×). The model demonstrated accuracies of 97.02% for binary classification and 93.29% for multi-classification, with decision times of 391.71 and 1173.56 s, respectively. These results highlight the potential of our hybrid deep ViT-CNN architecture for advancing tumor classification in histopathological images. The source code is accessible: https://github.com/abimouloud/TokenMixer. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 25244442
- Volume :
- 8
- Issue :
- 1
- Database :
- Complementary Index
- Journal :
- Visual Computing for Industry, Biomedicine & Art
- Publication Type :
- Academic Journal
- Accession number :
- 182154278
- Full Text :
- https://doi.org/10.1186/s42492-024-00181-8