Dataciencia

Colección SciELO Chile

Multi-Channel Speech Enhancement Using Labelled Random Finite Sets and a Neural Beamformer in Cocktail Party Scenario

Indexado

WoS	WOS:001453593500001
Scopus	SCOPUS_ID:105000950775

DOI

10.3390/APP15062944

Año

2025

Tipo

artículo de investigación

Citas Totales

Autores Afiliación Chile

Instituciones Chile

% Participación
Internacional

Autores
Afiliación Extranjera

Instituciones
Extranjeras

Abstract

In this research, a multi-channel target speech enhancement scheme is proposed that is based on deep learning (DL) architecture and assisted by multi-source tracking using a labeled random finite set (RFS) framework. A neural network based on minimum variance distortionless response (MVDR) beamformer is considered as the beamformer of choice, where a residual dense convolutional graph-U-Net is applied in a generative adversarial network (GAN) setting to model the beamformer for target speech enhancement under reverberant conditions involving multiple moving speech sources. The input dataset for this neural architecture is constructed by applying multi-source tracking using multi-sensor generalized labeled multi-Bernoulli (MS-GLMB) filtering, which belongs to the labeled RFS framework, to obtain estimations of the sources' positions and the associated labels (corresponding to each source) at each time frame with high accuracy under the effect of undesirable factors like reverberation and background noise. The tracked sources' positions and associated labels help to correctly discriminate the target source from the interferers across all time frames and generate time-frequency (T-F) masks corresponding to the target source from the output of a time-varying, minimum variance distortionless response (MVDR) beamformer. These T-F masks constitute the target label set used to train the proposed deep neural architecture to perform target speech enhancement. The exploitation of MS-GLMB filtering and a time-varying MVDR beamformer help in providing the spatial information of the sources, in addition to the spectral information, within the neural speech enhancement framework during the training phase. Moreover, the application of the GAN framework takes advantage of adversarial optimization as an alternative to maximum likelihood (ML)-based frameworks, which further boosts the performance of target speech enhancement under reverberant conditions. The computer simulations demonstrate that the proposed approach leads to better target speech enhancement performance compared with existing state-of-the-art DL-based methodologies which do not incorporate the labeled RFS-based approach, something which is evident from the 75% ESTOI and PESQ of 2.70 achieved by the proposed approach as compared with the 46.74% ESTOI and PESQ of 1.84 achieved by Mask-MVDR with self-attention mechanism at a reverberation time (RT60) of 550 ms.

Revista

Revista	ISSN
Applied Sciences Basel	2076-3417

Métricas Externas

PlumX	Altmetric	Dimensions

Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:

Plumx: https://plumanalytics.com/learn/about-metrics/
Altmetric: https://www.altmetric.com/about-altmetrics/what-are-altmetrics/
Dimensions: https://www.dimensions.ai/why-dimensions/

Disciplinas de Investigación

WOS
Chemistry, Multidisciplinary
Engineering, Multidisciplinary
Physics, Applied
Materials Science, Multidisciplinary

Scopus
Sin Disciplinas

SciELO
Sin Disciplinas

Muestra la distribución de disciplinas para esta publicación.

Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.

Colaboración Institucional

Muestra la distribución de colaboración, tanto nacional como extranjera, generada en esta publicación.

Autores - Afiliación

Ord.	Autor	Género	Institución - País
1	Datta, Jayanta	-	Universidad de Chile - Chile
2	Firoozabadi, Ali Dehghan	-	Universidad Tecnológica Metropolitana - Chile
3	Zabala-Blanco, David	-	Universidad Católica del Maule - Chile
4	Castillo-Soria, Francisco R.	Hombre	UNIV AUTONOMA SAN LUIS POTOSI - México

Muestra la afiliación y género (detectado) para los co-autores de la publicación.

Financiamiento

Fuente
Fondo Nacional de Desarrollo Científico y Tecnológico
Universidad Tecnológica Metropolitana
Agencia Nacional de Investigación y Desarrollo
projects ANID/FONDECYT
Competition for Research Regular Projects

Muestra la fuente de financiamiento declarada en la publicación.

Agradecimientos

Agradecimiento
The authors acknowledge the financial support from Projects ANID/FONDECYT Iniciacion No. 11230129, and the Competition for Research Regular Projects, year 2021, code LPR21-02; Universidad Tecnologica Metropolitana.
The authors acknowledge the financial support from Projects ANID/FONDECYT Iniciaci\u00F3n No. 11230129, and the Competition for Research Regular Projects, year 2021, code LPR21-02; Universidad Tecnol\u00F3gica Metropolitana.