Back to Search Start Over

Appearance-posture fusion network for distracted driving behavior recognition.

Authors :
Yang, Xiaohui
Qiao, Yu
Han, Shiyuan
Feng, Zhen
Chen, Yuehui
Source :
Expert Systems with Applications. Dec2024, Vol. 257, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

In recent years, detection techniques using computer vision and deep learning have shown promise in assessing driver distraction. This paper proposes a fusion network that combines a Spatial-Temporal Graph Convolutional Network (ST-GCN) and a hybrid convolutional network to integrate multimodal input data for recognizing distracted driver behavior. Specifically, to address the limitations of the ST-GCN method in modeling long-distance joint interaction features and inadequate temporal feature extraction, we design the Spatially Symmetric Configuration Partitioning Graph Convolutional Network (SSCP-GCN) to model relative motion information of symmetric relationships between limbs. Specifically, we utilize densely connected blocks for processing multi-scale temporal information between consecutive frames, thereby enhancing the reuse of bottom features. Furthermore, the expression of important temporal information is augmented by the introduction of the channel attention mechanism. To tackle the problem that the Mixed Convolution (MC) combining 3D convolution with 2D convolution cannot extract higher-order timing information and has limitations in modeling global dependency relationships, we compensate for its inability to capture higher-order temporal semantic information using the Time Shift Module (TSM) without consuming additional computational resources. Additionally, the 3D Multi-Head Self-Attention mechanism (3D MHSA) is employed to integrate global spatial–temporal information of high-level features, avoiding the issue of model complexity proliferation caused by the deep stacking design of Convolutional Neural Networks (CNN). Lastly, we introduce a multistream network framework that integrates driver posture and appearance features to harness complementary advantages, enabling us to combine multimodal input features to achieve better model performance. Experimental results indicate that the accuracy of the network designed in this paper reaches 95.6% and 94.3% on ASU dataset and NTU-RGB+D dataset, respectively. The small size of the model offers the possibility for practical application of the algorithm. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
257
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
179507002
Full Text :
https://doi.org/10.1016/j.eswa.2024.124883