Back to Search
Start Over
SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow
- Publication Year :
- 2024
-
Abstract
- Document spanners have been proposed as a formal framework for declarative Information Extraction (IE) from text, following IE products from the industry and academia. Over the past decade, the framework has been studied thoroughly in terms of expressive power, complexity, and the ability to naturally combine text analysis with relational querying. This demonstration presents SpannerLib a library for embedding document spanners in Python code. SpannerLib facilitates the development of IE programs by providing an implementation of Spannerlog (Datalog-based documentspanners) that interacts with the Python code in two directions: rules can be embedded inside Python, and they can invoke custom Python code (e.g., calls to ML-based NLP models) via user-defined functions. The demonstration scenarios showcase IE programs, with increasing levels of complexity, within Jupyter Notebook.<br />Comment: 4 pages
- Subjects :
- Computer Science - Databases
Computer Science - Information Retrieval
H.4
Subjects
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2409.01736
- Document Type :
- Working Paper
- Full Text :
- https://doi.org/10.14778/3685800.3685855