Colección SciELO Chile

Departamento Gestión de Conocimiento, Monitoreo y Prospección
Consultas o comentarios: productividad@anid.cl
Búsqueda Publicación
Búsqueda por Tema Título, Abstract y Keywords



Efficient Enumeration Algorithms for Regular Document Spanners
Indexado
WoS WOS:000583687500004
Scopus SCOPUS_ID:85079807779
DOI 10.1145/3351451
Año 2020
Tipo artículo de investigación

Citas Totales

Autores Afiliación Chile

Instituciones Chile

% Participación
Internacional

Autores
Afiliación Extranjera

Instituciones
Extranjeras


Abstract



Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages to locate the data that a user wants to extract from a text document and then store this data into variables. Since document spanners can easily generate large outputs, it is important to have efficient evaluation algorithms that can generate the extracted data in a quick succession, and with relatively little precomputation time. Toward this goal, we present a practical evaluation algorithm that allows output-linear delay enumeration of a spanner's result after a precomputation phase that is linear in the document. Although the algorithm assumes that the spanner is specified in a syntactic variant of variable-set automata, we also study how it can be applied when the spanner is specified by general variable-set automata, regex formulas, or spanner algebras. Finally, we study the related problem of counting the number of outputs of a document spanner and provide a fine-grained analysis of the classes of document spanners that support efficient enumeration of their results.

Métricas Externas



PlumX Altmetric Dimensions

Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:

Disciplinas de Investigación



WOS
Computer Science, Software Engineering
Computer Science, Information Systems
Scopus
Information Systems
SciELO
Sin Disciplinas

Muestra la distribución de disciplinas para esta publicación.

Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.

Colaboración Institucional



Muestra la distribución de colaboración, tanto nacional como extranjera, generada en esta publicación.


Autores - Afiliación



Ord. Autor Género Institución - País
1 Florenzano, Fernando Hombre Pontificia Universidad Católica de Chile - Chile
Instituto Milenio Fundamentos de los Datos - Chile
2 Riveros, Cristian Hombre Pontificia Universidad Católica de Chile - Chile
3 Ugarte, Martin Hombre Instituto Milenio Fundamentos de los Datos - Chile
Univ Libre Bruxelles - Bélgica
Université libre de Bruxelles (ULB) - Bélgica
Université libre de Bruxelles - Bélgica
4 Vansummeren, Stijn Hombre Université libre de Bruxelles (ULB) - Bélgica
Univ Libre Bruxelles - Bélgica
Université libre de Bruxelles - Bélgica
5 Vrgoc, Domagoj Hombre Pontificia Universidad Católica de Chile - Chile
Instituto Milenio Fundamentos de los Datos - Chile

Muestra la afiliación y género (detectado) para los co-autores de la publicación.

Financiamiento



Fuente
FONDECYT
Fondo Nacional de Desarrollo Científico y Tecnológico
Fondo Nacional de Desarrollo Científico y Tecnológico
Millennium Institute for Foundational Research on Data
Innoviris, the Brussels Institute for Research and Innovation (project SPICES)
Innoviris

Muestra la fuente de financiamiento declarada en la publicación.

Agradecimientos



Agradecimiento
F. Florenzano, C. Riveros, M. Ugarte, and D. Vrgoč were partially supported by the Millennium Institute for Foundational Research on Data. D. Vrgoč was also supported by FONDECYT project no. 11160383 and C. Riveros by FONDECYT project no. 11150653. M. Ugarte acknowledges support from Innoviris, the Brussels Institute for Research and Innovation (project SPICES). Authors’ addresses: F. Florenzano and C. Riveros, Pontificia Universidad Católica de Chile, Department of Computer Science, Vicuna Mackenna 4860, Edificio San Agustin, 4to piso, Macul, Santiago, 7820436, Chile; emails: {faflorenzano, cristian.riveros}@uc.cl; M. Ugarte, IMFD Chile, Vicuna Mackenna 4860, Edificio San Agustin, 4to piso, Macul, Santiago, 7820436, Chile; email: mugartec@ulb.ac.be; S. Vansummeren, Université Libre de Bruxelles 50, Av. F. Roosevelt, CP 165/15 B-1050 Brussels, Belgium; email: stijn.vansummeren@ulb.ac.bestijn, vansummeren@ulb.ac.be; D. Vrgoč, Pontificia Uni-versidad Católica de Chile, Institute for Mathematical and Computational Engineering, Vicuna Mackenna 4860, Edificio Hernán Briones, 2do piso, Macul, Santiago, 7820436, Chile; email: dvrgoc@ing.puc.cl. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2020 Association for Computing Machinery. 0362-5915/2020/02-ART3 $15.00 https://doi.org/10.1145/3351451
F. Florenzano, C. Riveros, M. Ugarte, and D. Vrgoc were partially supported by the Millennium Institute for Foundational Research on Data. D. Vrgoc was also supported by FONDECYT project no. 11160383 and C. Riveros by FONDECYT project no. 11150653. M. Ugarte acknowledges support from Innoviris, the Brussels Institute for Research and Innovation (project SPICES).

Muestra la fuente de financiamiento declarada en la publicación.