1. Combining Contextualized and Non-contextualized Query Translations to Improve CLIR
- Author
-
Douglas W. Oard, Suraj Nair, and Petra Galuščáková
- Subjects
Vocabulary ,Basis (linear algebra) ,Machine translation ,business.industry ,Computer science ,media_common.quotation_subject ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Probabilistic logic ,Contrast (statistics) ,Context (language use) ,02 engineering and technology ,Translation (geometry) ,computer.software_genre ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing ,media_common - Abstract
In cross-language information retrieval using probabilistic structured queries (PSQ), translation probabilities from statistical machine translation act as a bridge between the query and document vocabulary. These translation probabilities are typically estimated from a sentence-aligned corpus on a word to word basis without taking into account the context. Neural methods, by contrast, can learn to translate using the context around the words, and this can be used as a basis for estimating context-dependent translation probabilities. However, sparsity limits the accuracy of context-specific translation probabilities for rare words, which can be important in retrieval applications. This paper presents evidence that combining such context-dependent translation probabilities with context-independent translation probabilities learned from the same parallel corpus can yield improvements in the effectiveness of cross-language ranked retrieval.
- Published
- 2020
- Full Text
- View/download PDF