Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:
| Indexado |
|
||||
| DOI | 10.1109/JBHI.2025.3529348 | ||||
| Año | 2025 | ||||
| Tipo | artículo de investigación |
Citas Totales
Autores Afiliación Chile
Instituciones Chile
% Participación
Internacional
Autores
Afiliación Extranjera
Instituciones
Extranjeras
Depression, a prevalent mental health disorder with severe health and economic consequences, can be costly and difficult to detect. To alleviate this burden, recent research has been exploring the depression screening capabilities of deep learning (DL) models trained on videos of clinical interviews conducted by a virtual agent. Such DL models need to consider the challenges of modality representation, alignment, and fusion as well as small sample sizes. To address them, we propose WavFace, a multimodal deep learning model that inputs audio and temporal facial features. WavFace adds an encoder-transformer layer over pre-trained models to improve the unimodal representation. It also applies an explicit alignment method for both modalities and then uses sequential and spatial self-attention over the alignment. Finally, WavFace fuses the sequential and spatial self-attentions among the two modality embeddings, inspired by how mental health professionals simultaneously observe visual and vocal presentation during clinical interviews. By leveraging sequential and spatial self-attention, WavFace outperforms pre-trained unimodal and multimodal models from the literature. With a single interview question, WaveFace screened for depression with a balanced accuracy of 0.81. This presents a valuable modeling approach for audio-visual mental health screening.
| Ord. | Autor | Género | Institución - País |
|---|---|---|---|
| 1 | Flores, Ricardo | - |
Universidad de Concepción - Chile
Worcester Polytech Inst - Estados Unidos |
| 2 | Tlachac, M. L. | - |
Bryant Univ - Estados Unidos
Bryant University - Estados Unidos |
| 3 | Shrestha, Avantika | - |
Worcester Polytech Inst - Estados Unidos
|
| 4 | Rundensteiner, Elke A. | - |
Worcester Polytech Inst - Estados Unidos
|
| Fuente |
|---|
| National Science Foundation |
| NSF IIS |
| Fulbright Foreign Student Program |
| Fulbright U.S. Student Program |
| WPI Data Science Department from NSF MRI |
| Agradecimiento |
|---|
| This work was supported in part by NSF IIS under Grant 1910880, in part by Fulbright Foreign Student Program, and in part by WPI Data Science Department Results were obtained using an HPC from NSF MRI under Grant DMS-1337943 to WPI. |
| This work was supported in part by NSF IIS under Grant 1910880, in part by Fulbright Foreign Student Program, and in part by WPI Data Science Department Results were obtained using an HPC from NSF MRI under Grant DMS-1337943 to WPI. |