Back to Search Start Over

SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow

Authors :
Light, Dean
Aiashy, Ahmad
Diab, Mahmoud
Nachmias, Daniel
Vansummeren, Stijn
Kimelfeld, Benny
Publication Year :
2024

Abstract

Document spanners have been proposed as a formal framework for declarative Information Extraction (IE) from text, following IE products from the industry and academia. Over the past decade, the framework has been studied thoroughly in terms of expressive power, complexity, and the ability to naturally combine text analysis with relational querying. This demonstration presents SpannerLib a library for embedding document spanners in Python code. SpannerLib facilitates the development of IE programs by providing an implementation of Spannerlog (Datalog-based documentspanners) that interacts with the Python code in two directions: rules can be embedded inside Python, and they can invoke custom Python code (e.g., calls to ML-based NLP models) via user-defined functions. The demonstration scenarios showcase IE programs, with increasing levels of complexity, within Jupyter Notebook.<br />Comment: 4 pages

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2409.01736
Document Type :
Working Paper
Full Text :
https://doi.org/10.14778/3685800.3685855