Start Over

TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment.

Authors :: Macken, Lieve
Lefever, Els
Hoste, Veronique
Source :: Terminology. 2013, Vol. 19 Issue 1, p1-30. 30p. 2 Diagrams, 9 Charts, 4 Graphs.
Publication Year :: 2013
Abstract: We report on TExSIS, a flexible bilingual terminology extraction system that uses a sophisticated chunk-based alignment method for the generation of candidate terms, after which the specificity of the candidate terms is determined by combining several statistical filters. Although the set-up of the architecture is largely language-independent, we present terminology extraction results for four different languages and three language pairs. Gold standard data sets were created for French-Italian, French-English and French-Dutch, which allowed us not only to evaluate precision, which is common practice, but also recall. We compared the TExSIS approach, which takes a multilingual perspective from the start, with the more commonly used approach of first identifying term candidates monolingually and then aligning the source and target terms. A comparison of our system with the LUIZ approach described by Vintar (2010) reveals that TExSIS outperforms LUIZ both for monolingual and bilingual terminology extraction. Our results also clearly show that the precision of the alignment is crucial for the success of the terminology extraction. Furthermore, based on the observation that the precision scores for bilingual terminology extraction outperform those of the monolingual systems, we conclude that multilingual evidence helps to determine unithood in less related languages. [ABSTRACT FROM AUTHOR]

Subjects :: *BILINGUALISM
*TERMS & phrases
*EXTRACTION (Linguistics)
*FRENCH language
*ITALIAN language
*ENGLISH language
*DUTCH language

Details

Language :: English
ISSN :: 09299971
Volume :: 19
Issue :: 1
Database :: Academic Search Index
Journal :: Terminology
Publication Type :: Academic Journal
Accession number :: 87902739
Full Text :: https://doi.org/10.1075/term.19.1.01mac

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources