1. Influence of Treebank Design on Representation of Multiword Expressions
- Author
-
Daniel Zeman, Pavel Straňák, and Eduard Bejček
- Subjects
Annotation ,Dependency (UML) ,Computer science ,business.industry ,Treebank ,Artificial intelligence ,business ,computer.software_genre ,Representation (mathematics) ,computer ,Natural language processing ,Sentence - Abstract
Multiword Expressions (MWEs) are important linguistic units that require special treatment in many NLP applications. It is thus desirable to be able to recognize them automatically. Semantically annotated corpora should mark MWEs in a clear way that facilitates development of automatic recognition tools. In the present paper we discuss various corpus design decisions from this perspective. We propose guidelines that should lead to MWE-friendly annotation and evaluate them on numerous sentence examples. Our experience of identifying MWEs in the Prague Dependency Treebank provides the base for the discussion and examples from other languages are added whenever appropriate.
- Published
- 2011
- Full Text
- View/download PDF