Extracting Multilingual Lexicons from Parallel Corpora.

Authors :: Tufis, Dan
Barbu, Ana Maria
Ion, Radu
Source :: Computers & the Humanities. May2004, Vol. 38 Issue 2, p163-189. 27p. 5 Diagrams, 14 Charts, 1 Graph.
Publication Year :: 2004
Abstract: The paper describes our recent developments in automatic extraction of translation equivalents from parallel corpora. We describe three increasingly complex algorithms: a simple baseline iterative method, and two non-iterative more elaborated versions. While the baseline algorithm is mainly described for illustrative purposes, the non-iterative algorithms outline the use of different working hypotheses which may be motivated by different kinds of applications and to some extent by the languages concerned. The first two algorithms rely on cross-lingual POS preservation, while with the third one POS invariance is not an extraction condition. The evaluation of the algorithms was conducted on three different corpora and several pairs of languages. [ABSTRACT FROM AUTHOR]