Back to Search
Start Over
Improved robustness of vision transformers via prelayernorm in patch embedding.
- Source :
-
Pattern Recognition . Sep2023, Vol. 141, pN.PAG-N.PAG. 1p. - Publication Year :
- 2023
-
Abstract
- • We provide empirical tests on various image corruption using vision transformers. • Vision transformers showed performance degradation on contrast-enhanced images. • We proposed PreLayerNorm for the consistent behavior of positional embedding. • We observed that PreLayerNorm improved performance on contrast-enhanced images. • We provide theoretical analyses on the inconsistent behavior of vision transformers. [Display omitted] Vision Transformers (ViTs) have recently demonstrated state-of-the-art performance in various vision tasks, replacing convolutional neural networks (CNNs). However, because ViT has a different architectural design than CNN, it may behave differently. To investigate whether ViT has a different performance or robustness, we tested ViT and CNN under various imaging conditions in practical vision tasks. We confirmed that for most image transformations, ViT's robustness was comparable or even better than that of CNN. However, for contrast enhancement, ViT performed particularly poorly. We show that this is because positional embedding in ViT's patch embedding can work improperly when the color scale changes. We demonstrate that the use of PreLayerNorm, a modified patch embedding structure, ensures the consistent behavior of ViT. Results demonstrate that ViT with PreLayerNorm exhibited improved robustness in the contrast-varying environments. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 00313203
- Volume :
- 141
- Database :
- Academic Search Index
- Journal :
- Pattern Recognition
- Publication Type :
- Academic Journal
- Accession number :
- 163870032
- Full Text :
- https://doi.org/10.1016/j.patcog.2023.109659