Back to Search Start Over

EPSViTs: A hybrid architecture for image classification based on parameter-shared multi-head self-attention.

Authors :
Liao, Huixian
Li, Xiaosen
Qin, Xiao
Wang, Wenji
He, Guodui
Huang, Haojie
Guo, Xu
Chun, Xin
Zhang, Jinyong
Fu, Yunqin
Qin, Zhengyou
Source :
Image & Vision Computing. Sep2024, Vol. 149, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image. However, they still suffer from weak local feature extraction, easy loss of channel interaction information in one-dimensional multi-head self-attention modeling, and large number of parameters. This paper proposes a lightweight image classification hybrid architecture named EPSViTs (Efficient Parameter Shared Transformer, EPSViTs). Firstly, a new local feature extraction module is designed to effectively enhance the expression of local features. Secondly, using the parameter sharing approach, a lightweight multi-head self-attention module based on information interaction is designed, which can globally model the image from both spatial and channel dimensions, and mine the potential correlation of the image in space and channel. Extensive experiments are conducted on three public datasets, a subset of ImageNet, Cifar100 and APTOS2019, a private dataset Mushroom66, and the results show that the hybrid architecture EPSViTs proposed in this paper based on parameter sharing for multi-head self-attentive image classification has obvious advantages, especially on a subset of ImageNet to reach 89.18%, which is a 3.8% improvement compared to Edgevits_xxs, verifying the effectiveness of the model. • This paper designs a fine-grained local feature extraction module LFE. • This paper designs a lightweight parameter sharing attention mechanism EPSA. • A new lightweight hybrid architecture EPSViTs is built based on the LFE and EPSA. • The reliability and generalization of our model were validated on four datasets. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02628856
Volume :
149
Database :
Academic Search Index
Journal :
Image & Vision Computing
Publication Type :
Academic Journal
Accession number :
179030456
Full Text :
https://doi.org/10.1016/j.imavis.2024.105130