1. IntervoxNet: a novel dual-modal audio-text fusion network for automatic and efficient depression detection from interviews
- Author
-
Huijun Ding, Zhou Du, Ziwei Wang, Junqi Xue, Zhaoguo Wei, Kongjun Yang, Shan Jin, Zhiguo Zhang, and Jianhong Wang
- Subjects
dual-modal fusion ,depression detection ,Transformer ,classification ,deep learning ,attention mechanism ,Physics ,QC1-999 - Abstract
Depression is a prevalent mental health problem across the globe, presenting significant social and economic challenges. Early detection and treatment are pivotal in reducing these impacts and improving patient outcomes. Traditional diagnostic methods largely rely on subjective assessments by psychiatrists, underscoring the importance of developing automated and objective diagnostic tools. This paper presents IntervoxNet, a novel computeraided detection system designed specifically for analyzing interview audio. IntervoxNet incorporates a dual-modal approach, utilizing both the Audio Mel-Spectrogram Transformer (AMST) for audio processing and a hybrid model combining Bidirectional Encoder Representations from Transformers with a Convolutional Neural Network (BERT-CNN) for text analysis. Evaluated on the DAIC-WOZ database, IntervoxNet demonstrates excellent performance, achieving F1 score, recall, precision, and accuracy of 0.90, 0.92, 0.88, and 0.86 respectively, thereby surpassing existing state of the art methods. These results demonstrate IntervoxNet’s potential as a highly effective and efficient tool for rapid depression screening in interview settings.
- Published
- 2024
- Full Text
- View/download PDF