Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:
| Indexado |
|
||||
| DOI | 10.1145/3351451 | ||||
| Año | 2020 | ||||
| Tipo | artículo de investigación |
Citas Totales
Autores Afiliación Chile
Instituciones Chile
% Participación
Internacional
Autores
Afiliación Extranjera
Instituciones
Extranjeras
Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages to locate the data that a user wants to extract from a text document and then store this data into variables. Since document spanners can easily generate large outputs, it is important to have efficient evaluation algorithms that can generate the extracted data in a quick succession, and with relatively little precomputation time. Toward this goal, we present a practical evaluation algorithm that allows output-linear delay enumeration of a spanner's result after a precomputation phase that is linear in the document. Although the algorithm assumes that the spanner is specified in a syntactic variant of variable-set automata, we also study how it can be applied when the spanner is specified by general variable-set automata, regex formulas, or spanner algebras. Finally, we study the related problem of counting the number of outputs of a document spanner and provide a fine-grained analysis of the classes of document spanners that support efficient enumeration of their results.
| Ord. | Autor | Género | Institución - País |
|---|---|---|---|
| 1 | Florenzano, Fernando | Hombre |
Pontificia Universidad Católica de Chile - Chile
Instituto Milenio Fundamentos de los Datos - Chile |
| 2 | Riveros, Cristian | Hombre |
Pontificia Universidad Católica de Chile - Chile
|
| 3 | Ugarte, Martin | Hombre |
Instituto Milenio Fundamentos de los Datos - Chile
Univ Libre Bruxelles - Bélgica Université libre de Bruxelles (ULB) - Bélgica Université libre de Bruxelles - Bélgica |
| 4 | Vansummeren, Stijn | Hombre |
Université libre de Bruxelles (ULB) - Bélgica
Univ Libre Bruxelles - Bélgica Université libre de Bruxelles - Bélgica |
| 5 | Vrgoc, Domagoj | Hombre |
Pontificia Universidad Católica de Chile - Chile
Instituto Milenio Fundamentos de los Datos - Chile |
| Fuente |
|---|
| FONDECYT |
| Fondo Nacional de Desarrollo Científico y Tecnológico |
| Fondo Nacional de Desarrollo CientÃfico y Tecnológico |
| Millennium Institute for Foundational Research on Data |
| Innoviris, the Brussels Institute for Research and Innovation (project SPICES) |
| Innoviris |
| Agradecimiento |
|---|
| F. Florenzano, C. Riveros, M. Ugarte, and D. Vrgoč were partially supported by the Millennium Institute for Foundational Research on Data. D. Vrgoč was also supported by FONDECYT project no. 11160383 and C. Riveros by FONDECYT project no. 11150653. M. Ugarte acknowledges support from Innoviris, the Brussels Institute for Research and Innovation (project SPICES). Authors’ addresses: F. Florenzano and C. Riveros, Pontificia Universidad Católica de Chile, Department of Computer Science, Vicuna Mackenna 4860, Edificio San Agustin, 4to piso, Macul, Santiago, 7820436, Chile; emails: {faflorenzano, cristian.riveros}@uc.cl; M. Ugarte, IMFD Chile, Vicuna Mackenna 4860, Edificio San Agustin, 4to piso, Macul, Santiago, 7820436, Chile; email: mugartec@ulb.ac.be; S. Vansummeren, Université Libre de Bruxelles 50, Av. F. Roosevelt, CP 165/15 B-1050 Brussels, Belgium; email: stijn.vansummeren@ulb.ac.bestijn, vansummeren@ulb.ac.be; D. Vrgoč, Pontificia Uni-versidad Católica de Chile, Institute for Mathematical and Computational Engineering, Vicuna Mackenna 4860, Edificio Hernán Briones, 2do piso, Macul, Santiago, 7820436, Chile; email: dvrgoc@ing.puc.cl. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2020 Association for Computing Machinery. 0362-5915/2020/02-ART3 $15.00 https://doi.org/10.1145/3351451 |
| F. Florenzano, C. Riveros, M. Ugarte, and D. Vrgoc were partially supported by the Millennium Institute for Foundational Research on Data. D. Vrgoc was also supported by FONDECYT project no. 11160383 and C. Riveros by FONDECYT project no. 11150653. M. Ugarte acknowledges support from Innoviris, the Brussels Institute for Research and Innovation (project SPICES). |