Back to Search
Start Over
Per-former: rethinking person re-identification using transformer augmented with self-attention and contextual mapping.
- Source :
-
Visual Computer . Sep2023, Vol. 39 Issue 9, p4087-4102. 16p. - Publication Year :
- 2023
-
Abstract
- Person re-identification (re-id) is an autonomous process that uses raw surveillance images to identify a person across multiple non-overlapping camera views without requiring any kind of hard biometrics like fingerprints, retina patterns or the facial images. The CNN-based deep architectures are most frequently used to solve the person re-id problem. Generally these CNN architectures capture the attentive regions of a person at local neighborhood level with increased focal view at the deeper levels of the network. However these do not learn the self-attentions among distant parts of a person's image, which can play a vital role in person re-id especially to handle the inter-class and intra-class variances. In this work, we propose a novel person re-id approach to learn the self-attention among different parts of a person image whether these lie within local proximity or at the far distant regions for robust re-identification. We adapt the vision transformer architecture with a lightweight self-attention module, which learns the global associations among the distinct attentive regions having similar context within a person image. Further to this, we escalate the baseline model by acquainting it with a self-context mapping module, which coalesces the contextual embeddings into the self-attention learning for the neighboring and the distant image regions. It helps to capture the globally associated salient regions of a person to get the holistic view at the initial network layers. The proposed self-attention-based re-id architecture outperforms the vanilla CNN counterparts for both of the re-id performance measures, i.e., accuracy and mean average precision. The re-id accuracies are improved 5.5%, 4.6% and 17% for Market1501, DukeMTMC-ReID and MSMT-17 datasets, respectively, as compared to the vanilla CNN-based re-id architectures. The implementation and trained models are made publicly available at https://git.io/JLH2S. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 01782789
- Volume :
- 39
- Issue :
- 9
- Database :
- Academic Search Index
- Journal :
- Visual Computer
- Publication Type :
- Academic Journal
- Accession number :
- 171345990
- Full Text :
- https://doi.org/10.1007/s00371-022-02577-0