Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:
| Indexado |
|
||||
| DOI | 10.1016/J.IS.2020.101584 | ||||
| Año | 2022 | ||||
| Tipo | artículo de investigación |
Citas Totales
Autores Afiliación Chile
Instituciones Chile
% Participación
Internacional
Autores
Afiliación Extranjera
Instituciones
Extranjeras
Hate speech is an important problem that is seriously affecting the dynamics and usefulness of online social communities. Large scale social platforms are currently investing important resources into automatically detecting and classifying hateful content, without much success. On the other hand, the results reported by state-of-the-art systems indicate that supervised approaches achieve almost perfect performance but only within specific datasets, most of them in English language. In this work, we analyze this apparent contradiction between existing literature and actual applications. We study closely the experimental methodology used in prior work and their generalizability to other datasets. Our findings evidence methodological issues, as well as an important dataset bias. As a consequence, performance claims of the current state-of-the-art have become significantly overestimated. The problems that we have found are mostly related to data overfitting and sampling issues. We discuss the implications for current research and re-conduct experiments to give a more accurate picture of the current state-of-the art methods. Moreover, we design some baseline approaches to perform cross-lingual experiments, using English and Spanish datasets.
| Ord. | Autor | Género | Institución - País |
|---|---|---|---|
| 1 | Arango, Aymé | - |
Universidad de Chile - Chile
Instituto Milenio Fundamentos de los Datos - Chile |
| 2 | PEREZ-ROJAS, JORGE ADRIAN | Hombre |
Universidad de Chile - Chile
Instituto Milenio Fundamentos de los Datos - Chile |
| 3 | POBLETE-LABRA, BARBARA JEANNETTE | Mujer |
Universidad de Chile - Chile
Instituto Milenio Fundamentos de los Datos - Chile |
| Fuente |
|---|
| Fondo Nacional de Desarrollo Científico y Tecnológico |
| Fondecyt, Chile |
| Fondo Nacional de Desarrollo CientÃfico y Tecnológico |
| Millennium Institute for Foundational Research on Data, Chile (IMFD) |
| Agradecimiento |
|---|
| We thank Thomas Davidson for providing all the information concerning the dataset described in Davidson et al. [17] . This work was supported by the Millennium Institute for Foundational Research on Data, Chile (IMFD). Poblete was also funded by Fondecyt, Chile grant 1191604 , and Pérez by Fondecyt, Chile grant 1200967 . |
| We thank Thomas Davidson for providing all the information concerning the dataset described in Davidson et al. [17]. This work was supported by the Millennium Institute for Foundational Research on Data, Chile (IMFD). Poblete was also funded by Fondecyt, Chile grant 1191604, and Perez by Fondecyt, Chile grant 1200967. |