Back to Search
Start Over
Detecting Depression with Word-Level Multimodal Fusion
- Source :
- INTERSPEECH, Interspeech 2019
- Publication Year :
- 2019
- Publisher :
- ISCA, 2019.
-
Abstract
- Semi-structured clinical interviews are frequently used diagnostic tools for identifying depression during an assessment phase. In addition to the lexical content of a patient’s responses, multimodal cues concurrent with the responses are indicators of their motor and cognitive state, including those derivable from their voice quality and gestural behaviour. In this paper, we use information from different modalities in order to train a classifier capable of detecting the binary state of a subject (clinically depressed or not), as well as the level of their depression. We propose a model that is able to perform modality fusion incrementally after each word in an utterance using a time-dependent recurrent approach in a deep learning set-up. To mitigate noisy modalities, we utilize fusion gates that control the degree to which the audio or visual modality contributes to the final prediction. Our results show the effectiveness of word-level multimodal fusion, achieving state-of-the-art results in depression detection and outperforming early feature-level and late fusion techniques.
- Subjects :
- computational paralinguistics
Multimodal fusion
Computer science
Speech recognition
0102 computer and information sciences
02 engineering and technology
01 natural sciences
010201 computation theory & mathematics
depression
modality fusion
gestural behaviour
0202 electrical engineering, electronic engineering, information engineering
recurrent neural networks
020201 artificial intelligence & image processing
lexical content
Word (computer architecture)
Depression (differential diagnoses)
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Interspeech 2019
- Accession number :
- edsair.doi.dedup.....7829fcd4a4f4abb78005b2d0297e08a6
- Full Text :
- https://doi.org/10.21437/interspeech.2019-2283