Back to Search Start Over

Per-former: rethinking person re-identification using transformer augmented with self-attention and contextual mapping.

Authors :
Pervaiz, N.
Fraz, M. M.
Shahzad, M.
Source :
Visual Computer. Sep2023, Vol. 39 Issue 9, p4087-4102. 16p.
Publication Year :
2023

Abstract

Person re-identification (re-id) is an autonomous process that uses raw surveillance images to identify a person across multiple non-overlapping camera views without requiring any kind of hard biometrics like fingerprints, retina patterns or the facial images. The CNN-based deep architectures are most frequently used to solve the person re-id problem. Generally these CNN architectures capture the attentive regions of a person at local neighborhood level with increased focal view at the deeper levels of the network. However these do not learn the self-attentions among distant parts of a person's image, which can play a vital role in person re-id especially to handle the inter-class and intra-class variances. In this work, we propose a novel person re-id approach to learn the self-attention among different parts of a person image whether these lie within local proximity or at the far distant regions for robust re-identification. We adapt the vision transformer architecture with a lightweight self-attention module, which learns the global associations among the distinct attentive regions having similar context within a person image. Further to this, we escalate the baseline model by acquainting it with a self-context mapping module, which coalesces the contextual embeddings into the self-attention learning for the neighboring and the distant image regions. It helps to capture the globally associated salient regions of a person to get the holistic view at the initial network layers. The proposed self-attention-based re-id architecture outperforms the vanilla CNN counterparts for both of the re-id performance measures, i.e., accuracy and mean average precision. The re-id accuracies are improved 5.5%, 4.6% and 17% for Market1501, DukeMTMC-ReID and MSMT-17 datasets, respectively, as compared to the vanilla CNN-based re-id architectures. The implementation and trained models are made publicly available at https://git.io/JLH2S. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01782789
Volume :
39
Issue :
9
Database :
Academic Search Index
Journal :
Visual Computer
Publication Type :
Academic Journal
Accession number :
171345990
Full Text :
https://doi.org/10.1007/s00371-022-02577-0