1. Multilingual Semantic Relatedness Using Lightweight Machine Translation
- Author
-
Siegfried Handschuh, Siamak Barzegar, Brian Davis, and André Freitas
- Subjects
Information Systems and Management ,Commonsense knowledge ,Machine translation ,Computer Networks and Communications ,Computer science ,02 engineering and technology ,Semantic data model ,computer.software_genre ,Semantics ,Semantic Similarity ,Electronic mail ,machine translation ,Machine Translation ,Semantic similarity ,Multilingual Distributional Semantic Models ,Artificial Intelligence ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,business.industry ,Distributional semantic models ,Human-Computer Interaction ,Semantic Relatedness ,Task analysis ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Word (computer architecture) - Abstract
Distributional semantic models are strongly dependent on the size and the quality of the reference corpora, which embeds the commonsense knowledge necessary to build comprehensive models. While high-quality texts containing large-scale commonsense information are present in English, such as Wikipedia, other languages may lack sufficient textual support to build distributional models. This paper proposes using the combination of a lightweight (sloppy) machine translation model and an English Distributional Semantic Model (DSM) to provide higher quality word vectors for languages other than English. Results show that the lightweight MT model introduces significant improvements when compared to language-specific distributional models. Additionally, the lightweight MT outperforms more complex MT methods for the task of word-pair translation.
- Published
- 2018
- Full Text
- View/download PDF