Colección SciELO Chile

Departamento Gestión de Conocimiento, Monitoreo y Prospección
Consultas o comentarios: productividad@anid.cl
Búsqueda Publicación
Búsqueda por Tema Título, Abstract y Keywords



Combining Regular Expressions and Supervised Algorithms for Clinical Text Classification
Indexado
Scopus SCOPUS_ID:85177809834
DOI 10.1007/978-3-031-48232-8_35
Año 2023
Tipo

Citas Totales

Autores Afiliación Chile

Instituciones Chile

% Participación
Internacional

Autores
Afiliación Extranjera

Instituciones
Extranjeras


Abstract



Clinical text classification allows assigning labels to content-based data using machine learning algorithms. However, unlike other study domains, clinical texts present complex linguistic diversity, including abbreviations, typos, and numerical patterns that are difficult to represent by the most-used classification algorithms. In this sense, sequences of character strings and symbols, known as Regular Expressions (RegExs), offer an alternative to represent complex patterns from the texts and could be used jointly with the most commonly used classification algorithms for accurate text classification. Thus, a classification algorithm can label test texts when RegExs produce no matches. This work proposes a method that combines automatically-generated RegExs and supervised algorithms for classifying clinical texts. RegExs are automatically generated using alignment algorithms in a supervised manner, filtering out those that do not meet a minimum confidence threshold and do not contain specific keywords for the classification problem. At prediction time, our method assigns the class of the most confident RegEx that matches a test text. When no RegExs matches a test text, a supervised algorithm assigns a class. Three clinical datasets with textual information on obesity and smoking habits were used to assess the performance of four classifiers based on Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB), and Bidirectional Encoder Representations from Transformers (BERT). Classification results indicate that our method, on average, improved the classifiers’ performance by up to 12% in all performance metrics. These results show the ability of our method to generate confident RegExs that capture representative patterns from the texts for use with supervised algorithms.

Métricas Externas



PlumX Altmetric Dimensions

Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:

Disciplinas de Investigación



WOS
Sin Disciplinas
Scopus
Computer Science (All)
Theoretical Computer Science
SciELO
Sin Disciplinas

Muestra la distribución de disciplinas para esta publicación.

Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.

Colaboración Institucional



Muestra la distribución de colaboración, tanto nacional como extranjera, generada en esta publicación.


Autores - Afiliación



Ord. Autor Género Institución - País
1 FLORES-JARA, CHRISTOPHER ALEJANDRO Hombre Universidad de O’Higgins - Chile
2 VERSCHAE-TANNENBAUM, RODRIGO ANDRES Hombre Universidad de O’Higgins - Chile

Muestra la afiliación y género (detectado) para los co-autores de la publicación.

Financiamiento



Fuente
ANID Fondecyt
HGGB
ANID FONDEQUIP

Muestra la fuente de financiamiento declarada en la publicación.

Agradecimientos



Agradecimiento
This work was partially supported by ANID FONDECYT Postdoctorado 3220803 and ANID FONDEQUIP Mediano EQM170041. The authors thank the Informatics Unit with the HGGB, Concepción, Chile, for providing datasets.

Muestra la fuente de financiamiento declarada en la publicación.