Dataciencia

Colección SciELO Chile

Uncertainty weighting and propagation in DNN-HMM-based speech recognition

Indexado

WoS	WOS:000411903700003
Scopus	SCOPUS_ID:85024112365

DOI

10.1016/J.CSL.2017.06.005

Año

2018

Tipo

artículo de investigación

Citas Totales

Autores Afiliación Chile

Instituciones Chile

% Participación
Internacional

Autores
Afiliación Extranjera

Instituciones
Extranjeras

Abstract

In this paper an uncertainty weighting scheme for DNNHMM-based speech recognition is proposed to increase discriminability in the decoding process. To this end, the DNN pseudo-log-likelihoods are weighted according to the uncertainty variance assigned to the acoustic observation. The results presented here suggest that substantial reduction in WER is achieved with clean training. Moreover, modelling the uncertainty propagation through the DNN is not required and no approximations for non-linear activation functions are made. The presented method can be applied to any network topology that delivers log-likelihood-like scores. It can be combined with any noise removal technique and adds a minimal computational cost. This technique was exhaustively evaluated and combined with uncertainty-propagation-based schemes for computing the pseudo-log-likelihoods and uncertainty variance at the DNN output. Two proposed methods optimized the parameters of the weighting function by leveraging the grid search either on a development database representing the given task or on each utterance based on discrimination metrics. Experiments with Aurora-4 task showed that, with clean training, the proposed weighting scheme can reduce WER by a maximum of 21% compared with a baseline system with spectral subtraction and uncertainty propagation using the unscented transform. The uncertainty weighting method reduced the gap between clean and multi-noise/multi-condition training. This can be useful when it is not easy to train a DNNHMM system in conditions that are similar to the testing ones. Finally, the presented results on the use of uncertainty are very competitive with those published elsewhere using the same database as the one employed here. (C) 2017 The Authors. Published by Elsevier Ltd.

Revista

Revista	ISSN
Computer Speech And Language	0885-2308

Métricas Externas

PlumX	Altmetric	Dimensions

Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:

Plumx: https://plumanalytics.com/learn/about-metrics/
Altmetric: https://www.altmetric.com/about-altmetrics/what-are-altmetrics/
Dimensions: https://www.dimensions.ai/why-dimensions/

Disciplinas de Investigación

WOS
Computer Science, Artificial Intelligence

Scopus
Sin Disciplinas

SciELO
Sin Disciplinas

Muestra la distribución de disciplinas para esta publicación.

Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.

Colaboración Institucional

Muestra la distribución de colaboración, tanto nacional como extranjera, generada en esta publicación.

Autores - Afiliación

Ord.	Autor	Género	Institución - País
1	Novoa, José	Hombre	Universidad de Chile - Chile
2	Fredes, Josue	Hombre	Universidad de Chile - Chile
3	POBLETE-RAMIREZ, VICTOR HERNAN	Hombre	Universidad Austral de Chile - Chile
4	Yoma, Nestor Becerra	Hombre	Universidad de Chile - Chile

Muestra la afiliación y género (detectado) para los co-autores de la publicación.

Financiamiento

Fuente
Conicyt-Fondecyt
University of Edinburgh
CONICYT-PCHA/DoctoradoNacional
ONRGN
Canadian Mennonite University

Muestra la fuente de financiamiento declarada en la publicación.

Agradecimientos

Agradecimiento
There search reported here was funded by grants Conicyt-Fondecyt 1151306 and ONRGN 62909-17-1-2002. The authors would also like to thank Prof. Richard Stern, Robust Speech Recognition Group, CMU, for having provided the source code to run VTS. Finally, the authors would also like to thank Prof. Simon King, CSTR, University of Edinburgh, for having proofread the final version of the manuscript. Jose Novoa was supported by Grant CONICYT-PCHA/DoctoradoNacional/2014-21140711.
The research reported here was funded by grants Conicyt-Fondecyt 1151306 and ONRG N62909-17-1-2002. The authors would also like to thank Prof. Richard Stern, Robust Speech Recognition Group, CMU, for having provided the source code to run VTS. Finally, the authors would also like to thank Prof. Simon King, CSTR, University of Edinburgh, for having proofread the final version of the manuscript.

Agradecimiento

There search reported here was funded by grants Conicyt-Fondecyt 1151306 and ONRGN 62909-17-1-2002. The authors would also like to thank Prof. Richard Stern, Robust Speech Recognition Group, CMU, for having provided the source code to run VTS. Finally, the authors would also like to thank Prof. Simon King, CSTR, University of Edinburgh, for having proofread the final version of the manuscript. Jose Novoa was supported by Grant CONICYT-PCHA/DoctoradoNacional/2014-21140711.

The research reported here was funded by grants Conicyt-Fondecyt 1151306 and ONRG N62909-17-1-2002. The authors would also like to thank Prof. Richard Stern, Robust Speech Recognition Group, CMU, for having provided the source code to run VTS. Finally, the authors would also like to thank Prof. Simon King, CSTR, University of Edinburgh, for having proofread the final version of the manuscript.