1. Annotated Clause Boundaries’ Influence on Parsing Results
- Author
-
Kaili Müürisep, Dage Särg, and Kadri Muischnek
- Subjects
060201 languages & linguistics ,Parsing ,Computer science ,business.industry ,06 humanities and the arts ,02 engineering and technology ,computer.software_genre ,Estonian ,language.human_language ,Identification (information) ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Margin (machine learning) ,Dependency grammar ,0602 languages and literature ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Segmentation ,Syntactic structure ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
The aim of the paper is to study the effect of pre-annotated clause boundaries on dependency parsing of Estonian new media texts. Our hypothesis is that correct identification of clause boundaries helps to improve parsing because as the text is split into smaller syntactically meaningful units, it should be easier for the parser to determine the syntactic structure of a given unit. To test the hypothesis, we performed two experiments on a 14,000-word corpus of Estonian web texts whose morphological analysis had been manually validated. In the first experiment, the corpus with gold standard morphological tags was parsed with MaltParser both with and without the manually annotated clause boundaries. In the second experiment, only the segmentation of the text was preserved and the morphological analysis was done automatically before parsing. The experiments confirmed our hypothesis about the influence of correct clause boundaries by a small margin: in both experiments, the improvement of LAS was 0.6%.
- Published
- 2018
- Full Text
- View/download PDF