Dataciencia

Colección SciELO Chile

Are Text Classifiers Xenophobic? A Country-Oriented Bias Detection Method With Least Confounding Variables

Indexado

Scopus

SCOPUS_ID:85195929889

DOI

Año

2024

Tipo

Citas Totales

Autores Afiliación Chile

Instituciones Chile

% Participación
Internacional

Autores
Afiliación Extranjera

Instituciones
Extranjeras

Abstract

Classical bias detection methods used in Machine Learning are themselves biased because of the different confounding variables implied in the assessment of the initial biases. First they are using templates that are syntactically simple and distant from the target data on which the model will be applied. Second, current methods are assessing biases in pre-trained language models or in dataset, but not directly on the fine-tuned classifier that can actually produce harms. We propose a simple method to detect the biases of a specific fine-tuned classifier on any type of unlabeled data. The idea is to study the classifier behavior by creating counterfactual examples directly on the target data distribution and quantify the amount of changes. In this work, we focus on named entity perturbations by applying a Named Entity Recognition on target-domain data and modifying them accordingly to most common names or location of a target group (gender and country), and this for several morphosynctactically different languages spoken in relation with the countries of the target groups. We used our method on two models available open-source that are likely to be deployed by industry, and on two tasks and domains. We first assess the bias of a multilingual sentiment analysis model trained over multiple-languages tweets and available open-source, and then a multilingual stance recognition model trained over several languages and assessed over English language. Finally we propose to link the perplexity of each example with the bias of the model, by looking at the change in label distribution with respect to the language of the target group. Our work offers a fine-grained analysis of the interactions between names and languages, revealing significant biases in multilingual models.

Disciplinas de Investigación

WOS
Sin Disciplinas

Scopus
Sin Disciplinas

SciELO
Sin Disciplinas

Muestra la distribución de disciplinas para esta publicación.

Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.

Colaboración Institucional

Muestra la distribución de colaboración, tanto nacional como extranjera, generada en esta publicación.

Autores - Afiliación

Ord.	Autor	Género	Institución - País
1	Barriere, Valentin	-	Universidad de Chile - Chile
2	Cifuentes, Sebastian	-	Universidad de Chile - Chile

Muestra la afiliación y género (detectado) para los co-autores de la publicación.

Financiamiento

Fuente
National Center for Artificial Intelligence CENIA

Muestra la fuente de financiamiento declarada en la publicación.

Agradecimientos

Agradecimiento
Valentin thanks both Alexandra Balahur and Felipe Bravo for the early discussions on this work. This research has been funded by National Center for Artificial Intelligence CENIA FB210017, Basal ANID.