Back to Search
Start Over
Detecting the same text in different languages
- Source :
- Biblos-e Archivo. Repositorio Institucional de la UAM, instname
- Publication Year :
- 2006
- Publisher :
- IEEE, 2006.
-
Abstract
- Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. K. Koroutchev, and M. Cebrian, “Detecting the same text in different languages”, in Information Theory Workshop, 2006. ITW '06 Punta del Este. IEEE, Punta del Este, Uruguay, 2006, pp. 337-341<br />Compression based similarity distances have the main drawback of needing the same coding scheme for the objects to be compared. When two texts are translated, there exists significant similarity with no literal coincidence. In this article, we present an algorithm that compares the redundancy structure of the data extracted by means of a Lempel- Ziv compression scheme. Each text is presented as a graph and two texts are considered similar with our measure if they have the same referential topology when compressed. We give empirical evidence that this measure detects similarity between data coded in different languages.<br />This work was partially supported by grant TIN 2004-07676-G01 of the Spanish Ministry of Education and Culture. Partially supported by grant TSI 2005-08255-C07-06 of the Spanish Ministry of Education and Culture
- Subjects :
- Compression algorithms
Theoretical computer science
Computer science
Existential quantification
Entropy
Testing
Data_CODINGANDINFORMATIONTHEORY
computer.software_genre
Information theory
Topology
Coincidence
Length measurement
Computer science education
H infinity control
Entropy (information theory)
Humans
Data mining
Informática
business.industry
Object detection
Tin
Artificial intelligence
business
computer
Natural language processing
Data compression
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Biblos-e Archivo. Repositorio Institucional de la UAM, instname
- Accession number :
- edsair.doi.dedup.....63cc43f4850e15739bca0e7c01ba2c0d