Colección SciELO Chile

Departamento Gestión de Conocimiento, Monitoreo y Prospección
Consultas o comentarios: productividad@anid.cl
Búsqueda Publicación
Búsqueda por Tema Título, Abstract y Keywords



WavFace: A Multimodal Transformer-Based Model for Depression Screening
Indexado
WoS WOS:001483871500004
Scopus SCOPUS_ID:85215227173
DOI 10.1109/JBHI.2025.3529348
Año 2025
Tipo artículo de investigación

Citas Totales

Autores Afiliación Chile

Instituciones Chile

% Participación
Internacional

Autores
Afiliación Extranjera

Instituciones
Extranjeras


Abstract



Depression, a prevalent mental health disorder with severe health and economic consequences, can be costly and difficult to detect. To alleviate this burden, recent research has been exploring the depression screening capabilities of deep learning (DL) models trained on videos of clinical interviews conducted by a virtual agent. Such DL models need to consider the challenges of modality representation, alignment, and fusion as well as small sample sizes. To address them, we propose WavFace, a multimodal deep learning model that inputs audio and temporal facial features. WavFace adds an encoder-transformer layer over pre-trained models to improve the unimodal representation. It also applies an explicit alignment method for both modalities and then uses sequential and spatial self-attention over the alignment. Finally, WavFace fuses the sequential and spatial self-attentions among the two modality embeddings, inspired by how mental health professionals simultaneously observe visual and vocal presentation during clinical interviews. By leveraging sequential and spatial self-attention, WavFace outperforms pre-trained unimodal and multimodal models from the literature. With a single interview question, WaveFace screened for depression with a balanced accuracy of 0.81. This presents a valuable modeling approach for audio-visual mental health screening.

Métricas Externas



PlumX Altmetric Dimensions

Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:

Disciplinas de Investigación



WOS
Computer Science, Interdisciplinary Applications
Computer Science, Information Systems
Mathematical & Computational Biology
Medical Informatics
Scopus
Sin Disciplinas
SciELO
Sin Disciplinas

Muestra la distribución de disciplinas para esta publicación.

Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.

Colaboración Institucional



Muestra la distribución de colaboración, tanto nacional como extranjera, generada en esta publicación.


Autores - Afiliación



Ord. Autor Género Institución - País
1 Flores, Ricardo - Universidad de Concepción - Chile
Worcester Polytech Inst - Estados Unidos
2 Tlachac, M. L. - Bryant Univ - Estados Unidos
Bryant University - Estados Unidos
3 Shrestha, Avantika - Worcester Polytech Inst - Estados Unidos
4 Rundensteiner, Elke A. - Worcester Polytech Inst - Estados Unidos

Muestra la afiliación y género (detectado) para los co-autores de la publicación.

Financiamiento



Fuente
National Science Foundation
NSF IIS
Fulbright Foreign Student Program
Fulbright U.S. Student Program
WPI Data Science Department from NSF MRI

Muestra la fuente de financiamiento declarada en la publicación.

Agradecimientos



Agradecimiento
This work was supported in part by NSF IIS under Grant 1910880, in part by Fulbright Foreign Student Program, and in part by WPI Data Science Department Results were obtained using an HPC from NSF MRI under Grant DMS-1337943 to WPI.
This work was supported in part by NSF IIS under Grant 1910880, in part by Fulbright Foreign Student Program, and in part by WPI Data Science Department Results were obtained using an HPC from NSF MRI under Grant DMS-1337943 to WPI.

Muestra la fuente de financiamiento declarada en la publicación.