Back to Search Start Over

Multilingual Relevant Sentence Detection Using Reference Corpus.

Authors :
Sung Hyon Myaeng
Ming Zhou
Kam-Fai Wong
Hong-Jiang Zhang
Ming-Hung Hsu
Ming-Feng Tsai
Hsin-Hsi Chen
Source :
Information Retrieval Technology; 2005, p165-177, 13p
Publication Year :
2005

Abstract

IR with reference corpus is one approach when dealing with relevant sentences detection, which takes the result of IR as the representation of query (sentence). Lack of information and language difference are two major issues in relevant detection among multilingual sentences. This paper refers to a parallel corpus for information expansion and translation, and introduces different representations, i.e. sentence-vector, document-vector and term-vector. Both sentence-aligned and document-aligned corpora, i.e., Sinorama corpus and HKSAR corpus, are used. The factors of aligning granularity, the corpus domain, the corpus size, the language basis, and the term selection strategy are addressed. The experiment results show that MRR 0.839 is achieved for similarity computation between multilingual sentences when larger finer grain parallel corpus of the same domain as test data is adopted. Generally speaking, the sentence-vector approach is superior to the term-vector approach when sentence-aligned corpus is employed. The document-vector approach is better than the term-vector approach if document-aligned corpus is used. Considering the language issue, Chinese basis is more suitable to English basis in our experiments. We also employ the translated TREC novelty test bed to evaluate the overall performance. The experimental results show that multilingual relevance detection has 80% of the performance of monolingual relevance detection. That indicates the feasibility of IR with reference corpus approach in relevant sentence detection. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783540250654
Database :
Supplemental Index
Journal :
Information Retrieval Technology
Publication Type :
Book
Accession number :
32701392