1. Exploiting temporal information to detect conversational groups in videos and predict the next speaker.
- Author
-
Tosato, Lucrezia, Fortier, Victor, Bloch, Isabelle, and Pelachaud, Catherine
- Subjects
- *
SHORT-term memory , *LONG-term memory , *SPATIAL arrangement , *VIDEO signals , *SOCIAL interaction - Abstract
Studies in human–human interaction have introduced the concept of F-formation to describe the spatial arrangement of participants during social interactions. This paper has two objectives. It aims at detecting F-formations in video sequences and at predicting the next speaker in a group conversation. The proposed approach exploits time information and multimodal signals of humans in video sequences. In particular, we rely on measuring the engagement level of people as a feature of group belonging. Our approach makes use of a recursive neural network, the Long Short Term Memory (LSTM), to predict who will take the speaker's turn in a conversation group. Experiments on the MatchNMingle dataset led to 85% true positives in group detection and 98% accuracy in predicting the next speaker. • New method for analyzing videos of groups of persons in interaction. • New method for detecting groups (F-formations) using temporal information and multimodal signals. • Consider engagement level between persons as a feature of group belonging. • New method for predicting the next speaker based on LSTM. • Experiments on the MatchNMingle dataset: 85% true positives in group detection, 98% accuracy in predicting next speaker. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF