Back to Search
Start Over
Bi-Modal Progressive Mask Attention for Fine-Grained Recognition.
- Source :
- IEEE Transactions on Image Processing; 2020, Vol. 29, p7006-7018, 13p
- Publication Year :
- 2020
-
Abstract
- Traditional fine-grained image recognition is required to distinguish different subordinate categories (e.g., birds species) based on the visual cues beneath raw images. Due to both small inter-class variations and large intra-class variations, it is desirable to capture the subtle differences between these sub-categories, which is crucial but challenging for fine-grained recognition. Recently, language modality aggregation has been proved as a successful technique to improve visual recognition in the experience. In this paper, we introduce an end-to-end trainable Progressive Mask Attention (PMA) model for fine-grained recognition by leveraging both visual and language modalities. Our Bi-Modal PMA model can not only stage-by-stage capture the most discriminative part in the visual modality by our mask-based fashion, but also explore the out-of-visual-domain knowledge from the language modality in an interactional alignment paradigm. Specifically, at each stage, a self-attention module is proposed to attend to the key patch from images or text descriptions. Besides, a query-relational module is designed to seize the key words/phrases of texts and further bridge the connection between two modalities. Later, the learned representations of bi-modality from multiple stages are aggregated as the final features for recognition. Our Bi-Modal PMA model only needs raw images and raw text descriptions, without requiring bounding boxes/part annotations in images or key word annotations in texts. By conducting comprehensive experiments on fine-grained benchmark datasets, we demonstrate that the proposed method achieves superior performance over the competing baselines, on either vision and language bi-modality or single visual modality. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 10577149
- Volume :
- 29
- Database :
- Complementary Index
- Journal :
- IEEE Transactions on Image Processing
- Publication Type :
- Academic Journal
- Accession number :
- 170078462
- Full Text :
- https://doi.org/10.1109/TIP.2020.2996736