Muestra la distribución de disciplinas para esta publicación.
Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.
| Indexado |
|
||
| DOI | |||
| Año | 2022 | ||
| Tipo |
Citas Totales
Autores Afiliación Chile
Instituciones Chile
% Participación
Internacional
Autores
Afiliación Extranjera
Instituciones
Extranjeras
Word embeddings have been widely used in Natural Language Processing (NLP) tasks. Although these representations can capture the semantic information of words, they cannot learn the sequence-level semantics. This problem can be handled using contextual word embeddings derived from pre-trained language models, which have contributed to significant improvements in several NLP tasks. Further improvements are achieved when pretraining these models on domain-specific corpora. In this paper, we introduce Clinical Flair, a domain-specific language model trained on Spanish clinical narratives. To validate the quality of the contextual representations retrieved from our model, we tested them on four named entity recognition datasets belonging to the clinical and biomedical domains. Our experiments confirm that incorporating domain-specific embeddings into classical sequence labeling architectures improves model performance dramatically compared to general-domain embeddings, demonstrating the importance of having these resources available.
| Ord. | Autor | Género | Institución - País |
|---|---|---|---|
| 1 | ROJAS-VALENZUELA, MATIAS ISMAEL | Hombre |
Universidad de Chile - Chile
|
| 2 | Dunstan, Jocelyn | Mujer |
Universidad de Chile - Chile
ANID - Chile |
| 3 | Villena, Fabian | Hombre |
Universidad de Chile - Chile
|
| Fuente |
|---|
| FONDEQUIP |
| Fondo Nacional de Desarrollo Científico y Tecnológico |
| Universidad Austral de Chile |
| Agencia Nacional de Investigación y Desarrollo |
| Millennium Science Initiative Program ICN2021_004 |
| Agradecimiento |
|---|
| This work was funded by ANID Chile: Basal Funds for Center of Excellence FB210005 (CMM), Millennium Science Initiative Program ICN2021_004 (iHealth), and Fondecyt grant 11201250. This research was partially supported by the supercomputing infrastructure of the NLHPC (ECM-02) and the Patagón supercomputer of Universidad Austral de Chile (FONDEQUIP EQM180042). We also acknowledge the help received from Kinan Martin and the reviewers. |