Dataciencia

Colección SciELO Chile

Simple yet Powerful: An Overlooked Architecture for Nested Named Entity Recognition

Indexado

Scopus

SCOPUS_ID:85141592621

DOI

Año

2022

Tipo

Citas Totales

Autores Afiliación Chile

Instituciones Chile

% Participación
Internacional

Autores
Afiliación Extranjera

Instituciones
Extranjeras

Abstract

Named Entity Recognition (NER) is an important task in Natural Language Processing that aims to identify text spans belonging to predefined categories. Traditional NER systems ignore nested entities, which are entities contained in other entity mentions. Although several methods have been proposed to address this case, most of them rely on complex task-specific structures and ignore potentially useful baselines for the task. We argue that this creates an overly optimistic impression of their performance. This paper revisits the Multiple LSTM-CRF (MLC) model, a simple, overlooked, yet powerful approach based on training independent sequence labeling models for each entity type. Extensive experiments with three nested NER corpora show that, regardless of the simplicity of this model, its performance is better or at least as well as more sophisticated methods. Furthermore, we show that the MLC architecture achieves state-of-the-art results in the Chilean Waiting List corpus by including pre-trained language models. In addition, we implemented an open-source library that computes task-specific metrics for nested NER. The results suggest that metrics used in previous work do not measure well the ability of a model to detect nested entities, while our metrics provide new evidence on how existing approaches handle the task.

Disciplinas de Investigación

WOS
Sin Disciplinas

Scopus
Sin Disciplinas

SciELO
Sin Disciplinas

Muestra la distribución de disciplinas para esta publicación.

Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.

Colaboración Institucional

Muestra la distribución de colaboración, tanto nacional como extranjera, generada en esta publicación.

Autores - Afiliación

Ord.	Autor	Género	Institución - País
1	ROJAS-VALENZUELA, MATIAS ISMAEL	Hombre	Universidad de Chile - Chile
2	Bravo-Marquez, Felipe	Hombre	Universidad de Chile - Chile Centro Nacional de Inteligencia Artificial (CENIA) - Chile Instituto Milenio Fundamentos de los Datos - Chile Millennium Institute for Foundational Research on Data (IMFD) - Chile
3	Dunstan, Jocelyn	Mujer	Universidad de Chile - Chile ANID - Chile

Muestra la afiliación y género (detectado) para los co-autores de la publicación.

Financiamiento

Fuente
FONDEQUIP
Fondo Nacional de Desarrollo Científico y Tecnológico
Universidad Austral de Chile
IMFD
U-INICIA VID
Agencia Nacional de Investigación y Desarrollo
CENIA

Muestra la fuente de financiamiento declarada en la publicación.

Agradecimientos

Agradecimiento
This work was funded by ANID Chile: Basal Funds for Center of Excellence FB210005 (CMM) and FB210017 (CENIA); Millennium Science Initiative Program ICN17_002 (IMFD) and ICN2021_004 (iHealth); Fondecyt grants 11200290 and 11201250. We also acknowledge the U-Inicia VID Project UI-004/20. This research was partially supported by the supercomputing infrastructure of the NLHPC (ECM-02) and the Patagón supercomputer of Universidad Austral de Chile (FONDEQUIP EQM180042). We are also grateful from the help received from the reviewers.

Agradecimiento

This work was funded by ANID Chile: Basal Funds for Center of Excellence FB210005 (CMM) and FB210017 (CENIA); Millennium Science Initiative Program ICN17_002 (IMFD) and ICN2021_004 (iHealth); Fondecyt grants 11200290 and 11201250. We also acknowledge the U-Inicia VID Project UI-004/20. This research was partially supported by the supercomputing infrastructure of the NLHPC (ECM-02) and the Patagón supercomputer of Universidad Austral de Chile (FONDEQUIP EQM180042). We are also grateful from the help received from the reviewers.