1. Multi-geometry embedded transformer for facial expression recognition in videos.
- Author
-
Chen, Dongliang, Wen, Guihua, Li, Huihui, Yang, Pei, Chen, Chuyun, and Wang, Bao
- Subjects
- *
FACIAL expression , *HYPERBOLIC spaces , *MULTILEVEL models , *VIDEOS , *EMOTIONAL state - Abstract
Dynamic facial expressions in videos express more realistic emotional states, and recognizing emotions from in-the-wild facial expression videos is a challenging task due to the changeable posture, partial occlusion and various light conditions. Although current methods have designed transformer-based models to learn spatial–temporal features, they cannot explore useful local geometry structures from both spatial and temporal views to capture subtle emotional features for the videos with varied poses and facial occlusion. To this end, we propose a novel multi-geometry embedded transformer (MGET), which adapts multi-geometry knowledge into transformers and excavates spatial–temporal geometry information as complementary to learn effective emotional features. Specifically, from a new perspective, we first design a multi-geometry distance learning (MGDL) to capture emotion-related geometry structure knowledge under Euclidean and Hyperbolic spaces. Especially based on the advantages of hyperbolic geometry, it finds the more subtle emotional changes among local spatial and temporal features. Secondly, we combine MGDL with transformer to design spatial–temporal MGETs, which capture important spatial and temporal multi-geometry features to embed them into their corresponding original features, and then perform cross-regions and cross-frame interaction on these multi-level features. Finally, MGET gains superior performance on DFEW, FERV39k and AFEW datasets, where the unweighted average recall (UAR) and weighted average recall (WAR) are 58.65%/69.91%, 41.91%/50.76% and 53.23%/55.40%, respectively, and the gained improvements are 2.55%/0.66%, 3.69%/2.63% and 3.66%/1.14% compared to M3DFEL, Logo-Forme and EST methods. • A multi-geometry embedded transformer is proposed for in-the-wild FER in videos. • MGDL captures multi-geometry structures under Euclidean and Hyperbolic spaces. • MGET combines MGDL with transformer to model multi-level spatial-temporal features. • MGET shows superior performance on in-the-wild video-based FER databases. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF