1. Improving Statistical Word Alignments with Morpho-syntactic Transformations.
- Author
-
Salakoski, Tapio, Ginter, Filip, Pyysalo, Sampo, Pahikkala, Tapio, Gispert, Adrià, Gupta, Deepa, Popović, Maja, Lambert, Patrik, Mariño, Jose B., Federico, Marcello, Ney, Hermann, and Banchs, Rafael
- Abstract
This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish-English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF