Dataciencia

Colección SciELO Chile

WavFace: A Multimodal Transformer-Based Model for Depression Screening

Indexado

WoS	WOS:001483871500004
Scopus	SCOPUS_ID:85215227173

DOI

10.1109/JBHI.2025.3529348

Año

2025

Tipo

artículo de investigación

Citas Totales

Autores Afiliación Chile

Instituciones Chile

% Participación
Internacional

Autores
Afiliación Extranjera

Instituciones
Extranjeras

Abstract

Depression, a prevalent mental health disorder with severe health and economic consequences, can be costly and difficult to detect. To alleviate this burden, recent research has been exploring the depression screening capabilities of deep learning (DL) models trained on videos of clinical interviews conducted by a virtual agent. Such DL models need to consider the challenges of modality representation, alignment, and fusion as well as small sample sizes. To address them, we propose WavFace, a multimodal deep learning model that inputs audio and temporal facial features. WavFace adds an encoder-transformer layer over pre-trained models to improve the unimodal representation. It also applies an explicit alignment method for both modalities and then uses sequential and spatial self-attention over the alignment. Finally, WavFace fuses the sequential and spatial self-attentions among the two modality embeddings, inspired by how mental health professionals simultaneously observe visual and vocal presentation during clinical interviews. By leveraging sequential and spatial self-attention, WavFace outperforms pre-trained unimodal and multimodal models from the literature. With a single interview question, WaveFace screened for depression with a balanced accuracy of 0.81. This presents a valuable modeling approach for audio-visual mental health screening.

Revista

Revista	ISSN
Ieee #Journal Of Biomedical And Health Informatics	2168-2194

Métricas Externas

PlumX	Altmetric	Dimensions

Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:

Plumx: https://plumanalytics.com/learn/about-metrics/
Altmetric: https://www.altmetric.com/about-altmetrics/what-are-altmetrics/
Dimensions: https://www.dimensions.ai/why-dimensions/

Disciplinas de Investigación

WOS
Computer Science, Interdisciplinary Applications
Computer Science, Information Systems
Mathematical & Computational Biology
Medical Informatics

Scopus
Sin Disciplinas

SciELO
Sin Disciplinas

Muestra la distribución de disciplinas para esta publicación.

Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.

Colaboración Institucional

Muestra la distribución de colaboración, tanto nacional como extranjera, generada en esta publicación.

Autores - Afiliación

Ord.	Autor	Género	Institución - País
1	Flores, Ricardo	-	Universidad de Concepción - Chile Worcester Polytech Inst - Estados Unidos
2	Tlachac, M. L.	-	Bryant Univ - Estados Unidos Bryant University - Estados Unidos
3	Shrestha, Avantika	-	Worcester Polytech Inst - Estados Unidos
4	Rundensteiner, Elke A.	-	Worcester Polytech Inst - Estados Unidos

Muestra la afiliación y género (detectado) para los co-autores de la publicación.

Financiamiento

Fuente
National Science Foundation
NSF IIS
Fulbright Foreign Student Program
Fulbright U.S. Student Program
WPI Data Science Department from NSF MRI

Muestra la fuente de financiamiento declarada en la publicación.

Agradecimientos

Agradecimiento
This work was supported in part by NSF IIS under Grant 1910880, in part by Fulbright Foreign Student Program, and in part by WPI Data Science Department Results were obtained using an HPC from NSF MRI under Grant DMS-1337943 to WPI.
This work was supported in part by NSF IIS under Grant 1910880, in part by Fulbright Foreign Student Program, and in part by WPI Data Science Department Results were obtained using an HPC from NSF MRI under Grant DMS-1337943 to WPI.