Colección SciELO Chile

Departamento Gestión de Conocimiento, Monitoreo y Prospección
Consultas o comentarios: productividad@anid.cl
Búsqueda Publicación
Búsqueda por Tema Título, Abstract y Keywords



Guide for the application of the data augmentation approach on sets of texts in Spanish for sentiment and emotion analysis
Indexado
WoS WOS:001320777500060
Scopus SCOPUS_ID:85205146749
DOI 10.1371/JOURNAL.PONE.0310707
Año 2024
Tipo artículo de investigación

Citas Totales

Autores Afiliación Chile

Instituciones Chile

% Participación
Internacional

Autores
Afiliación Extranjera

Instituciones
Extranjeras


Abstract



Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.

Revista



Revista ISSN
P Lo S One 1932-6203

Métricas Externas



PlumX Altmetric Dimensions

Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:

Disciplinas de Investigación



WOS
Biology
Multidisciplinary Sciences
Scopus
Sin Disciplinas
SciELO
Sin Disciplinas

Muestra la distribución de disciplinas para esta publicación.

Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.

Colaboración Institucional



Muestra la distribución de colaboración, tanto nacional como extranjera, generada en esta publicación.


Autores - Afiliación



Ord. Autor Género Institución - País
1 Benítez, Rodrigo Gutiérrez - Universidad del Bío Bío - Chile
2 Navarrete, Alejandra Segura Mujer Universidad del Bío Bío - Chile
3 VIDAL-CASTRO, CHRISTIAN LAUTARO Hombre Universidad del Bío Bío - Chile
4 Martinez-Araneda, Claudia Mujer Universidad Católica de la Santísima Concepción - Chile

Muestra la afiliación y género (detectado) para los co-autores de la publicación.

Financiamiento



Fuente
Universidad del Bío-Bío
Universidad Católica de la Santísima Concepción
InES de Genero
Faculty of Business Sciences of the Universidad del Bio-Bio, Chile
Open Science
SOftware-MOdelling-Science

Muestra la fuente de financiamiento declarada en la publicación.

Agradecimientos



Agradecimiento
This research was conducted in alliance with the SoMos (SOftware-MOdelling-Science) research group, which has the support of the Research Directorate and the Faculty of Business Sciences of the Universidad del Bio-Bio, Chile. The authors thank the Engineering 2030 Project (ING222010004) in collaboration with the InES de Genero (INGE220011) and Open Science (INCA210005) projects of Universidad Catolica de la Santisima Concepcion, Chile.
This research was conducted in alliance with the SoMos (SOftware-MOdelling-Science) research group, which has the support of the Research Directorate and the Faculty of Business Sciences of the Universidad del Bio-B\u00EDo, Chile. The authors thank the Engineering 2030 Project (ING222010004) in collaboration with the InES de G\u00E9nero (INGE220011) and Open Science (INCA210005) projects of Universidad Cat\u00F3lica de la Sant\u00EDsima Concepci\u00F3n, Chile.

Muestra la fuente de financiamiento declarada en la publicación.