1. MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation
- Author
-
Efthymiou, V, Jiménez-Ruiz, E, Chen, J, Cutrona, V, Hassanzadeh, O, Sequeda, J, Srinivas, K, Abdelmageed, N, Hulsebos, M, Marzocchi, M, Cremaschi, M, Pozzi, R, Avogadro, R, Palmonari, M, Efthymiou, V, Jiménez-Ruiz, E, Chen, J, Cutrona, V, Hassanzadeh, O, Sequeda, J, Srinivas, K, Abdelmageed, N, Hulsebos, M, Marzocchi, M, Cremaschi, M, Pozzi, R, Avogadro, R, and Palmonari, M
- Abstract
In this paper, we present MammoTab, a dataset composed of 1M Wikipedia tables extracted from over 20M Wikipedia pages and annotated through Wikidata. The lack of this kind of datasets in the state- of-the-art makes MammoTab a good resource for testing and training Semantic Table Interpretation approaches. The dataset has been designed to cover several key challenges, such as disambiguation, homonymy, and NIL-mentions. The dataset has been evaluated using MTab, one of the best approaches of the SemTab challenge.
- Published
- 2023