Start Over

A Novel Two-Stream Transformer-Based Framework for Multi-Modality Human Action Recognition

Authors :: Jing Shi
Yuanyuan Zhang
Weihang Wang
Bin Xing
Dasha Hu
Liangyin Chen
Source :: Applied Sciences, Vol 13, Iss 4, p 2058 (2023)
Publication Year :: 2023
Publisher :: MDPI AG, 2023.
Abstract: Due to the great success of Vision Transformer (ViT) in image classification tasks, many pure Transformer architectures for human action recognition have been proposed. However, very few works have attempted to use Transformer to conduct bimodal action recognition, i.e., both skeleton and RGB modalities for action recognition. As proved in many previous works, RGB modality and skeleton modality are complementary to each other in human action recognition tasks. How to use both RGB and skeleton modalities for action recognition in a Transformer-based framework is a challenge. In this paper, we propose RGBSformer, a novel two-stream pure Transformer-based framework for human action recognition using both RGB and skeleton modalities. Using only RGB videos, we can acquire skeleton data and generate corresponding skeleton heatmaps. Then, we input skeleton heatmaps and RGB frames to Transformer at different temporal and spatial resolutions. Because the skeleton heatmaps are primary features compared to the original RGB frames, we use fewer attention layers in the skeleton stream. At the same time, two ways are proposed to fuse the information of two streams. Experiments demonstrate that the proposed framework achieves the state of the art on four benchmarks: three widely used datasets, Kinetics400, NTU RGB+D 60, and NTU RGB+D 120, and the fine-grained dataset FineGym99.

Subjects :: multi-modality action recognition
two-stream Transformer
skeleton modality
Technology
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999

Details

Language :: English
ISSN :: 20763417
Volume :: 13
Issue :: 4
Database :: Directory of Open Access Journals
Journal :: Applied Sciences
Publication Type :: Academic Journal
Accession number :: edsdoj.babb4b7e16ac424aaab7829e00f2e25b
Document Type :: article
Full Text :: https://doi.org/10.3390/app13042058

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A Novel Two-Stream Transformer-Based Framework for Multi-Modality Human Action Recognition

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A Novel Two-Stream Transformer-Based Framework for Multi-Modality Human Action Recognition

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources