1. XLIndy
- Author
-
Maik Thiele, Nico Luettig, Dana Kuban, Dominik Olwig, Elvis Koci, Wolfgang Lehner, Oscar Romero, Julius Gonsior, Universitat Politècnica de Catalunya. Doctorat Erasmus Mundus en Tecnologies de la Informació per a la Intel·ligència Empresarial, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació, and Universitat Politècnica de Catalunya. IMP - Information Modeling and Processing
- Subjects
Informàtica::Intel·ligència artificial::Aprenentatge automàtic [Àrees temàtiques de la UPC] ,Information extraction ,Spreadsheets ,Computer science ,Layout inference ,Annotation ,Informàtica::Sistemes d'informació [Àrees temàtiques de la UPC] ,Inference ,02 engineering and technology ,Add-in ,computer.software_genre ,Bottleneck ,Interactive ,020204 information systems ,Machine learning ,Aprenentatge automàtic ,0202 electrical engineering, electronic engineering, information engineering ,Information retrieval ,Excel ,computer.programming_language ,Recuperació de la informació ,Microsoft excel ,020207 software engineering ,Python (programming language) ,JSON ,Electronic spreadsheets ,Table recognition ,computer ,Fulls electrònics - Abstract
Over the years, spreadsheets have established their presence in many domains, including business, government, and science. However, challenges arise due to spreadsheets being partially-structured and carrying implicit (visual and textual) information. This translates into a bottleneck, when it comes to automatic analysis and extraction of information. Therefore, we present XLIndy, a Microsoft Excel add-in with a machine learning back-end, written in Python. It showcases our novel methods for layout inference and table recognition in spreadsheets. For a selected task and method, users can visually inspect the results, change configurations, and compare different runs. This enables iterative fine-tuning. Additionally, users can manually revise the predicted layout and tables, and subsequently save them as annotations. The latter is used to measure performance and (re-)train classifiers. Finally, data in the recognized tables can be extracted for further processing. XLIndy supports several standard formats, such as CSV and JSON.
- Published
- 2019
- Full Text
- View/download PDF