1. Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture Detection
- Author
-
Hamid Heydarian, Marc T. P. Adam, Tracy L. Burrows, and Megan E. Rollo
- Subjects
Score-level fusion ,decision-level fusion ,intake gesture detection ,deep leaning ,inertial ,accelerometer ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Recent research has employed deep learning to detect intake gestures from inertial sensor and video camera data. However, the fusion of these modalities has not been attempted. The present research explores the potential of fusing the outputs of two individual deep learning inertial and video intake gesture detection models (i.e., score-level and decision-level fusion) using the test sets from two publicly available multimodal datasets: (1) OREBA-DIS recorded from 100 participants while consuming food served in discrete portions and (2) OREBA-SHA recorded from 102 participants while consuming a communal dish. We first assess the potential of fusion by contrasting the performance of the individual models in intake gesture detection. The assessment shows that fusing the outputs of individual models is more promising on the OREBA-DIS dataset. Subsequently, we conduct experiments using different score-level and decision-level fusion approaches. Our results from fusion show that the score-level fusion approach of max score model performs best of all considered fusion approaches. On the OREBA-DIS dataset, the max score fusion approach ( $F_{1} =0.871$ ) outperforms both individual video ( $F_{1} =0.855$ ) and inertial ( $F_{1} =0.806$ ) models. However, on the OREBA-SHA dataset, the max score fusion approach ( $F_{1} =0.873$ ) fails to outperform the individual inertial model ( $F_{1} =0.895$ ). Pairwise comparisons using bootstrapped samples confirm the statistical significance of these differences in model performance ( $p \lt $ .001).
- Published
- 2025
- Full Text
- View/download PDF