Back to Search Start Over

Enhanced Video Analytics for Sentiment Analysis Based on Fusing Textual, Auditory and Visual Information

Authors :
Sadam Al-Azani
El-Sayed M. El-Alfy
Source :
IEEE Access, Vol 8, Pp 136843-136857 (2020)
Publication Year :
2020
Publisher :
IEEE, 2020.

Abstract

With the widespread of online videos and digital transformation, video informatics and analytics have recently gained substantially increasing importance with an impressive success in a variety of tasks such as digital marketing, video surveillance and security systems, healthcare systems, talk show analysis, analysis of influencing groups in social media, and target tracking. This paper evaluates the potential contribution of various video modalities and how they are correlated to video analytics for sentiment analysis in the morphologically-rich Arabic language. Moreover, an enhanced approach is presented for video analytics to predict the speaker’s sentiment of multi-dialect Arabic through the integration of textual, auditory and visual modalities. Different features are extracted to represent each modality including prosodic and spectral acoustic features to represent audio, neural word embedding to represent audio text transcript, and dense optical-flow descriptors to represent visual modality. The extracted features are used individually to train two machine learning classifiers to provide a baseline. Then, the effectiveness of various combinations of modalities is verified using multi-level fusion (feature, score and decision). The experimental results demonstrate that the proposed approach of combining different modalities can lead to more accurate prediction of speaker’s sentiment with above 94% accuracy.

Details

Language :
English
ISSN :
21693536
Volume :
8
Database :
OpenAIRE
Journal :
IEEE Access
Accession number :
edsair.doi.dedup.....39efa2c1c3113dd298c8326211aa08f3