51. Semantic Recommendation System for Bilingual Corpus of Academic Papers
- Author
-
Irina Nikishina, Anna Safaryan, Petr Filchenkov, Andrey Kutuzov, and Weijia Yan
- Subjects
Word embedding ,Computer science ,business.industry ,Bilingual dictionary ,Cosine similarity ,Semantic search ,Recommender system ,computer.software_genre ,Task (project management) ,Semantic similarity ,Relevance (information retrieval) ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
We tested four methods of making document representations cross-lingual for the task of semantic search for the similar papers based on the corpus of papers from three Russian conferences on NLP: Dialogue, AIST and AINL. The pipeline consisted of three stages: preprocessing, word-by-word vectorisation using models obtained with various methods to map vectors from two independent vector spaces to a common one, and search for the most similar papers based on the cosine similarity of text vectors. The four methods used can be grouped into two approaches: 1) aligning two pretrained monolingual word embedding models with a bilingual dictionary on our own (for example, with the VecMap algorithm) and 2) using pre-aligned cross-lingual word embedding models (MUSE). To find out, which approach brings more benefit to the task, we conducted a manual evaluation of the results and calculated the average precision of recommendations for all the methods mentioned above. MUSE turned out to have the highest search relevance, but the other methods produced more recommendations in a language other than the one of the target paper.
- Published
- 2021