Loading [MathJax]/extensions/MathMenu.js
Parallel Multiscale Bridge Fusion Network for Audio–Visual Automatic Depression Assessment | IEEE Journals & Magazine | IEEE Xplore

Parallel Multiscale Bridge Fusion Network for Audio–Visual Automatic Depression Assessment


Abstract:

Depression is a prevalent and severe mental illness that significantly impacts patients’ physical health and daily life. Recent studies have focused on multimodal depress...Show More

Abstract:

Depression is a prevalent and severe mental illness that significantly impacts patients’ physical health and daily life. Recent studies have focused on multimodal depression assessment, aiming to objectively and conveniently evaluate depression using multimodal data. However, existing methods based on audio–visual modalities struggle to capture the dynamic variations in depression clues and cannot fully explore multimodal data over a long time. In addition, they rely heavily on insufficient single-stage multimodal fusion, which limits the accuracy of depression assessment. To address these limitations, we propose a novel parallel multiscale bridge fusion network (PMBFN) for audio–visual depression assessment. PMBFN comprehensively captures subtle multilevel dynamic changes in depression expression through parallel multiscale dynamic convolutions and long short-term memories (LSTMs) and effectively solves the problem of long-term audio–visual sequence information loss by using spatiotemporal attention pooling modules. Furthermore, the multimodal bridge fusion module is proposed in PMBFN to achieve multistage interactive recursive multimodal fusion, enhancing the expressive capacity of multimodal depression-related features to improve the accuracy of assessment. Extensive experiments on the DAIC-WOZ and E-DAIC datasets demonstrate that our method outperforms current state-of-the-art methods and clearly shows our method's effectiveness eventually.
Published in: IEEE Transactions on Computational Social Systems ( Volume: 11, Issue: 5, October 2024)
Page(s): 6830 - 6842
Date of Publication: 23 July 2024

ISSN Information:

Funding Agency:


I. Introduction

Major depressive disorder (MDD) [1], also known as depression, is a complex and severe mental illness that has gained widespread attention [2]. Patients suffering from depression may experience persistent low mood, diminished interest, sleep disturbances, and appetite dysregulation, significantly impacting their quality of life and potentially leading to suicidal ideation [3]. However, with the use of medication and psychotherapy, depressive symptoms can be alleviated and improved [4], [5]. Therefore, early assessment of the severity of depression is crucial in mitigating the adverse effects of these symptoms. In clinical practice, the assessment of depression typically relies on experienced doctors conducting structured [6] or semistructured interviews and utilizing standardized self-rating scales for evaluating the severity of depression, e.g., PHQ-8 [7]. Nevertheless, this approach is subjective, labor-intensive, and time-consuming, especially considering the persistent increase in the number of individuals affected by depression. The task of depression assessment can be regarded as a specific information processing task. In recent years, due to the successful results achieved by deep learning in information processing [8], [9], [10], automatic depression assessment based on deep learning has emerged as a new possibility, offering an objective, convenient, and efficient auxiliary diagnostic approach by analyzing depression cues in multimodal data. Currently, significant research on depression assessment based on deep learning has been proposed [11], [12], [13], [14], providing an up-and-coming promising prospect for achieving automatic depression analysis.

References

References is not available for this document.