Back to Search
Start Over
EPSViTs: A hybrid architecture for image classification based on parameter-shared multi-head self-attention.
- Source :
-
Image & Vision Computing . Sep2024, Vol. 149, pN.PAG-N.PAG. 1p. - Publication Year :
- 2024
-
Abstract
- Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image. However, they still suffer from weak local feature extraction, easy loss of channel interaction information in one-dimensional multi-head self-attention modeling, and large number of parameters. This paper proposes a lightweight image classification hybrid architecture named EPSViTs (Efficient Parameter Shared Transformer, EPSViTs). Firstly, a new local feature extraction module is designed to effectively enhance the expression of local features. Secondly, using the parameter sharing approach, a lightweight multi-head self-attention module based on information interaction is designed, which can globally model the image from both spatial and channel dimensions, and mine the potential correlation of the image in space and channel. Extensive experiments are conducted on three public datasets, a subset of ImageNet, Cifar100 and APTOS2019, a private dataset Mushroom66, and the results show that the hybrid architecture EPSViTs proposed in this paper based on parameter sharing for multi-head self-attentive image classification has obvious advantages, especially on a subset of ImageNet to reach 89.18%, which is a 3.8% improvement compared to Edgevits_xxs, verifying the effectiveness of the model. • This paper designs a fine-grained local feature extraction module LFE. • This paper designs a lightweight parameter sharing attention mechanism EPSA. • A new lightweight hybrid architecture EPSViTs is built based on the LFE and EPSA. • The reliability and generalization of our model were validated on four datasets. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 02628856
- Volume :
- 149
- Database :
- Academic Search Index
- Journal :
- Image & Vision Computing
- Publication Type :
- Academic Journal
- Accession number :
- 179030456
- Full Text :
- https://doi.org/10.1016/j.imavis.2024.105130