Start Over

Detecting the same text in different languages

Authors :: Kostadin Koroutchev
Manuel Cebrián
UAM. Departamento de Ingeniería Informática
Aprendizaje Automático (ING EPS-001)
Neurocomputación Biológica (ING EPS-005)
Source :: Biblos-e Archivo. Repositorio Institucional de la UAM, instname
Publication Year :: 2006
Publisher :: IEEE, 2006.
Abstract: Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. K. Koroutchev, and M. Cebrian, “Detecting the same text in different languages”, in Information Theory Workshop, 2006. ITW '06 Punta del Este. IEEE, Punta del Este, Uruguay, 2006, pp. 337-341<br />Compression based similarity distances have the main drawback of needing the same coding scheme for the objects to be compared. When two texts are translated, there exists significant similarity with no literal coincidence. In this article, we present an algorithm that compares the redundancy structure of the data extracted by means of a Lempel- Ziv compression scheme. Each text is presented as a graph and two texts are considered similar with our measure if they have the same referential topology when compressed. We give empirical evidence that this measure detects similarity between data coded in different languages.<br />This work was partially supported by grant TIN 2004-07676-G01 of the Spanish Ministry of Education and Culture. Partially supported by grant TSI 2005-08255-C07-06 of the Spanish Ministry of Education and Culture

Details

Language :: English
Database :: OpenAIRE
Journal :: Biblos-e Archivo. Repositorio Institucional de la UAM, instname
Accession number :: edsair.doi.dedup.....63cc43f4850e15739bca0e7c01ba2c0d

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Detecting the same text in different languages

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Detecting the same text in different languages

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources