Back to Search Start Over

TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment.

Authors :
Macken, Lieve
Lefever, Els
Hoste, Veronique
Source :
Terminology. 2013, Vol. 19 Issue 1, p1-30. 30p. 2 Diagrams, 9 Charts, 4 Graphs.
Publication Year :
2013

Abstract

We report on TExSIS, a flexible bilingual terminology extraction system that uses a sophisticated chunk-based alignment method for the generation of candidate terms, after which the specificity of the candidate terms is determined by combining several statistical filters. Although the set-up of the architecture is largely language-independent, we present terminology extraction results for four different languages and three language pairs. Gold standard data sets were created for French-Italian, French-English and French-Dutch, which allowed us not only to evaluate precision, which is common practice, but also recall. We compared the TExSIS approach, which takes a multilingual perspective from the start, with the more commonly used approach of first identifying term candidates monolingually and then aligning the source and target terms. A comparison of our system with the LUIZ approach described by Vintar (2010) reveals that TExSIS outperforms LUIZ both for monolingual and bilingual terminology extraction. Our results also clearly show that the precision of the alignment is crucial for the success of the terminology extraction. Furthermore, based on the observation that the precision scores for bilingual terminology extraction outperform those of the monolingual systems, we conclude that multilingual evidence helps to determine unithood in less related languages. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09299971
Volume :
19
Issue :
1
Database :
Academic Search Index
Journal :
Terminology
Publication Type :
Academic Journal
Accession number :
87902739
Full Text :
https://doi.org/10.1075/term.19.1.01mac