Dataciencia

Colección SciELO Chile

Hate speech detection is not as easy as you may think: A closer look at model validation (extended version)

Indexado

WoS	WOS:000740349400008
Scopus	SCOPUS_ID:85087929876

DOI

10.1016/J.IS.2020.101584

Año

2022

Tipo

artículo de investigación

Citas Totales

Autores Afiliación Chile

Instituciones Chile

% Participación
Internacional

Autores
Afiliación Extranjera

Instituciones
Extranjeras

Abstract

Hate speech is an important problem that is seriously affecting the dynamics and usefulness of online social communities. Large scale social platforms are currently investing important resources into automatically detecting and classifying hateful content, without much success. On the other hand, the results reported by state-of-the-art systems indicate that supervised approaches achieve almost perfect performance but only within specific datasets, most of them in English language. In this work, we analyze this apparent contradiction between existing literature and actual applications. We study closely the experimental methodology used in prior work and their generalizability to other datasets. Our findings evidence methodological issues, as well as an important dataset bias. As a consequence, performance claims of the current state-of-the-art have become significantly overestimated. The problems that we have found are mostly related to data overfitting and sampling issues. We discuss the implications for current research and re-conduct experiments to give a more accurate picture of the current state-of-the art methods. Moreover, we design some baseline approaches to perform cross-lingual experiments, using English and Spanish datasets.

Revista

Revista	ISSN
Information Systems	0306-4379

Métricas Externas

PlumX	Altmetric	Dimensions

Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:

Plumx: https://plumanalytics.com/learn/about-metrics/
Altmetric: https://www.altmetric.com/about-altmetrics/what-are-altmetrics/
Dimensions: https://www.dimensions.ai/why-dimensions/

Disciplinas de Investigación

WOS
Computer Science, Information Systems

Scopus
Information Systems
Software
Hardware And Architecture

SciELO
Sin Disciplinas

Muestra la distribución de disciplinas para esta publicación.

Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.

Colaboración Institucional

Muestra la distribución de colaboración, tanto nacional como extranjera, generada en esta publicación.

Autores - Afiliación

Ord.	Autor	Género	Institución - País
1	Arango, Aymé	-	Universidad de Chile - Chile Instituto Milenio Fundamentos de los Datos - Chile
2	PEREZ-ROJAS, JORGE ADRIAN	Hombre	Universidad de Chile - Chile Instituto Milenio Fundamentos de los Datos - Chile
3	POBLETE-LABRA, BARBARA JEANNETTE	Mujer	Universidad de Chile - Chile Instituto Milenio Fundamentos de los Datos - Chile

Muestra la afiliación y género (detectado) para los co-autores de la publicación.

Financiamiento

Fuente
Fondo Nacional de Desarrollo Científico y Tecnológico
Fondecyt, Chile
Fondo Nacional de Desarrollo CientÃfico y TecnolÃ³gico
Millennium Institute for Foundational Research on Data, Chile (IMFD)

Muestra la fuente de financiamiento declarada en la publicación.

Agradecimientos

Agradecimiento
We thank Thomas Davidson for providing all the information concerning the dataset described in Davidson et al. [17] . This work was supported by the Millennium Institute for Foundational Research on Data, Chile (IMFD). Poblete was also funded by Fondecyt, Chile grant 1191604 , and Pérez by Fondecyt, Chile grant 1200967 .
We thank Thomas Davidson for providing all the information concerning the dataset described in Davidson et al. [17]. This work was supported by the Millennium Institute for Foundational Research on Data, Chile (IMFD). Poblete was also funded by Fondecyt, Chile grant 1191604, and Perez by Fondecyt, Chile grant 1200967.