Back to Search Start Over

Polish natural language inference and factivity: An expert-based dataset and benchmarks.

Authors :
Ziembicki, Daniel
Seweryn, Karolina
Wróblewska, Anna
Source :
Natural Language Engineering; Mar2024, Vol. 30 Issue 2, p385-416, 32p
Publication Year :
2024

Abstract

Despite recent breakthroughs in Machine Learning for Natural Language Processing, the Natural Language Inference (NLI) problems still constitute a challenge. To this purpose, we contribute a new dataset that focuses exclusively on the factivity phenomenon; however, our task remains the same as other NLI tasks, that is prediction of entailment, contradiction, or neutral (ECN). In this paper, we describe the LingFeatured NLI corpus and present the results of analyses designed to characterize the factivity/non-factivity opposition in natural language. The dataset contains entirely natural language utterances in Polish and gathers 2432 verb-complement pairs and 309 unique verbs. The dataset is based on the National Corpus of Polish (NKJP) and is a representative subcorpus in regard to syntactic construction [V][że][cc]. We also present an extended version of the set (3035 sentences) consisting more sentences with internal negations. We prepared deep learning benchmarks for both sets. We found that transformer BERT-based models working on sentences obtained relatively good results ( $\approx 89\%$ F1 score on base dataset). Even though better results were achieved using linguistic features ( $\approx 91\%$ F1 score on base dataset), this model requires more human labor (humans in the loop) because features were prepared manually by expert linguists. BERT-based models consuming only the input sentences show that they capture most of the complexity of NLI/factivity. Complex cases in the phenomenon—for example, cases with entitlement (E) and non-factive verbs—still remain an open issue for further research. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13513249
Volume :
30
Issue :
2
Database :
Complementary Index
Journal :
Natural Language Engineering
Publication Type :
Academic Journal
Accession number :
176361465
Full Text :
https://doi.org/10.1017/S1351324923000220