Back to Search
Start Over
Applying artificial intelligence methods for solving problems of searching for semantic associates: case of toponym Moskva
- Source :
- Vestnik of Astrakhan State Technical University. Series: Management, computer science and informatics. 2022:41-51
- Publication Year :
- 2022
- Publisher :
- Astrakhan State Technical University, 2022.
-
Abstract
- Actual problems of toponymy imply the study of individual words in order to restore the conceptual meaning of geographical names lost, to find out how they reflected the characteristic features of the terrain, the type of ac-tivity of the people inhabiting it, etc. The purpose of the study is to determine the origin of the toponym Moskva by using artificial intelligence methods. The GeoWAC fastText embedding model based on the corpus of Russian-language texts of the RusVecteres service is used to calculate semantic similarity between words. The model assumes defining the semantic associates of toponyms by using the vector representation of words in the semantic space and finding the lexical vectors most closely located to the vector of the original word. To analyze a toponym there is applied a methods of semantic associates, a cluster analysis, a combined method based on the method of transformation of a word with a lost meaning and the analysis of semantic associates for a set of word transformants. The method is formalized by using a model that determines the similarity of the studied word and associates based on different versions of the model for one or more text corpora. The associated words obtained by the artificial intelligence are considered as a semantic cluster, and the calculated cosine similarity between vectors is considered as a measure of the similarity of elements in the cluster. To identify various hypotheses of the origin of the toponym Moskva there has been carried out a cluster analysis of the totality of the first ten vector associates for all transformants of this word. As a result, four hypotheses were advanced: “a famous man”, “firearms”, “beekeeping”, “blood-sucking insects”. The probabilities of the occurrence of these hypotheses are based on the study of the frequency of words in the corpus of the language. The main hypothesis is a “famous person”.
Details
- ISSN :
- 22249761 and 20729502
- Volume :
- 2022
- Database :
- OpenAIRE
- Journal :
- Vestnik of Astrakhan State Technical University. Series: Management, computer science and informatics
- Accession number :
- edsair.doi...........9913baaca167945ee99bd41b5882cfb3