Descriptor: "WSD" / Topic: natural language processing - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"WSD"' showing total 23 results

Start Over Descriptor "WSD" Topic natural language processing

23 results on '"WSD"'

1. Comparative Analysis of Decision Tree and k-NN to Solve WSD Problem in Kashmiri

Author: Mir, Tawseef Ahmad, Lawaye, Aadil Ahmad, Rana, Parveen, Ahmed, Ghayas, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Hassanien, Aboul Ella, editor, Castillo, Oscar, editor, Anand, Sameer, editor, and Jaiswal, Ajay, editor
Published: 2024
Full Text: View/download PDF

2. Assamese Word Sense Disambiguation using Cuckoo Search Algorithm.

Author: Gogoi, Arjun, Baruah, Nomi, and Nath, Lakhya Jyoti
Subjects: SEARCH algorithms, MACHINE translating, ALGORITHMS, NATURAL languages, TABU search algorithm, PROBLEM solving, NATURAL language processing
Abstract: Natural language processing is associated with human-computer interaction, where several challenges require natural language understanding. The Word sense disambiguation problem comprises the computational assignment of meaning to a word according to a specific context in which it occurs. There are numerous natural language processing applications, such as machine translation, information retrieval, and information extraction, which require this task which takes place at the semantic level. To solve this problem unsupervised computation proposals can be effective since they have been successfully used for many real-world optimization problems. In this paper, we propose to solve the word sense disambiguation problem using the cuckoo search algorithm in the Assamese language. We illustrate the performance of our algorithm by carrying out experiments on an Assamese corpus. And comparing them against an unsupervised genetic algorithm that is implemented in the Assamese language. Results of the experiment show that the cuckoo algorithm can achieve more precision, recall and F-measure, attaining 87.5, 84, and 85.71 percentages respectively. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

3. Word sense disambiguation based on stretchable matching of the semantic template.

Author: Wang, Wei, Huang, Degen, and Yu, Haitao
Subjects: *NATURAL language processing, *VARIATION in language, *NATURAL languages, *VOCABULARY
Abstract: It is evident that the traditional hard matching of a fixed-length template cannot satisfy the nearly indefinite variations in natural language. This issue mainly results from three major problems of the traditional matching mode: 1) in matching with a short template, the context of natural language cannot be effectively captured; 2) in matching with a long template, serious data sparsity will lead to a low success rate of template matching (i.e., low recall); and 3) due to a lack of flexible matching ability, traditional hard matching is more prone to failure. Therefore, this paper proposed a novel method of stretchable matching of the semantic template (SMOST) to deal with the above problems. We have applied this method to word sense disambiguation in the natural language processing field. In the same case of using only the SemCor corpus, the result of our system is very close to the best result of existing systems, which shows the effectiveness of new proposed method. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

4. An Integration Model of Semantic Annotation Based on Synergetic Neural Network.

Author: Huang, Zhehuang and Chen, Yidong
Subjects: SEMANTIC computing, ANNOTATIONS, SYNERGETICS, NEURAL circuitry, NATURAL language processing
Abstract: Correct and automatical semantic analysis has always been one of major goals in natural language understanding. However, due to the difficulties in deep semantic analysis, at present, the mainstream studies of semantic analysis are focused on semantic role labeling (SRL) and word sense disambiguation (WSD). Nowadays, these two issues are mostly considered as separate tasks. However, this approach ignores possible dependencies between them. In order to address the issue, an integrative semantic analysis model based on synergetic neural network (SNN) is proposed in this paper, which can easily express useful logic constraints between SRL and WSD. The semantic analysis process can be viewed as the competition process of semantic order parameters. The strongest order parameter will win by competition and desired semantic patterns will be recognized. There are three main innovations in this paper. First, an integrative semantic analysis model is proposed that jointly models word sense disambiguationand semantic role labeling. Second, integrative order parameter is reconstructed to reflect the relation among semantic patterns. Finally, integrative network parameters and integrative evolution equation are reconstructed, which can reflect the relationship of guiding and driving each other between word sense and semantic roles. The experiment results on OntoNotes 2.0 corpus shows the integrative method in this paper has a higher performance for semantic role labeling and word sense disambiguation, and provides a good practicability and a promising future for other natural language processing tasks. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

5. Arabic Gloss WSD Using BERT

Author: Fahima A. Maghraby, Mohamed Waleed Fakhr, and Mohammed El-Razzaz
Subjects: Computer science, 020209 energy, Context (language use), 02 engineering and technology, computer.software_genre, lcsh:Technology, lcsh:Chemistry, Semantic similarity, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, WSD, Representation (mathematics), Instrumentation, lcsh:QH301-705.5, Fluid Flow and Transfer Processes, Arabic, business.industry, lcsh:T, Process Chemistry and Technology, context gloss, General Engineering, Gloss (optics), lcsh:QC1-999, Computer Science Applications, lcsh:Biology (General), lcsh:QD1-999, lcsh:TA1-2040, Benchmark (computing), 020201 artificial intelligence & image processing, Artificial intelligence, business, lcsh:Engineering (General). Civil engineering (General), Encoder, computer, Natural language processing, Word (computer architecture), lcsh:Physics, Test data, BERT
Abstract: Word Sense Disambiguation (WSD) aims to predict the correct sense of a word given its context. This problem is of extreme importance in Arabic, as written words can be highly ambiguous, 43% of diacritized words have multiple interpretations and the percentage increases to 72% for non-diacritized words. Nevertheless, most Arabic written text does not have diacritical marks. Gloss-based WSD methods measure the semantic similarity or the overlap between the context of a target word that needs to be disambiguated and the dictionary definition of that word (gloss of the word). Arabic gloss WSD suffers from a lack of context-gloss datasets. In this paper, we present an Arabic gloss-based WSD technique. We utilize the celebrated Bidirectional Encoder Representation from Transformers (BERT) to build two models that can efficiently perform Arabic WSD. These models can be trained with few training samples since they utilize BERT models that were pretrained on a large Arabic corpus. Our experimental results show that our models outperform two of the most recent gloss-based WSDs when we test them against the same test data used to evaluate our model. Additionally, our model achieves an F1-score of 89% compared to the best-reported F1-score of 85% for knowledge-based Arabic WSD. Another contribution of this paper is introducing a context-gloss benchmark that may help to overcome the lack of a standardized benchmark for Arabic gloss-based WSD.
Published: 2021

6. Exemplification Modeling: Can You Give Me an Example, Please?

Author: Caterina Lacerra, Edoardo Barba, Roberto Navigli, Tommaso Pasini, and Luigi Procopio
Subjects: Exemplification, Computer science, sequence-to-sequence, BART, WSD, Word Sense Disambiguation, NLP, Natural Language Processing, Epistemology
Abstract: Recently, generative approaches have been used effectively to provide definitions of words in their context. However, the opposite, i.e., generating a usage example given one or more words along with their definitions, has not yet been investigated. In this work, we introduce the novel task of Exemplification Modeling (ExMod), along with a sequence-to-sequence architecture and a training procedure for it. Starting from a set of (word, definition) pairs, our approach is capable of automatically generating high-quality sentences which express the requested semantics. As a result, we can drive the creation of sense-tagged data which cover the full range of meanings in any inventory of interest, and their interactions within sentences. Human annotators agree that the sentences generated are as fluent and semantically-coherent with the input definitions as the sentences in manually-annotated corpora. Indeed, when employed as training data for Word Sense Disambiguation, our examples enable the current state of the art to be outperformed, and higher results to be achieved than when using gold-standard datasets only. We release the pretrained model, the dataset and the software at https://github.com/SapienzaNLP/exmod.
Published: 2021

7. ConSeC: Word Sense Disambiguation as Continuous Sense Comprehension

Author: Barba, Edoardo, Procopio, Luigi, and Navigli, Roberto
Subjects: 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, 010501 environmental sciences, WSD, Word Sense Disambiguation, 01 natural sciences, NLP, Natural Language Processing, 0105 earth and related environmental sciences
Published: 2021
Full Text: View/download PDF

8. Determining the difficulty of Word Sense Disambiguation.

Author: McInnes, Bridget T. and Stevenson, Mark
Abstract: Highlights: [•] We explore estimating WSD performance on a range of ambiguous biomedical terms. [•] We evaluate the difficulty predictions against the output of two WSD systems. [•] Supervised methods are the best predictors but limited by labeled training data. [•] Unsupervised methods all perform well and can be applied more widely. [•] Best performance was obtained using the relatedness measure proposed by Lesk. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

9. Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text.

Author: McInnes, Bridget T. and Pedersen, Ted
Abstract: Introduction: In this article, we evaluate a knowledge-based word sense disambiguation method that determines the intended concept associated with an ambiguous word in biomedical text using semantic similarity and relatedness measures. These measures quantify the degree of similarity or relatedness between concepts in the Unified Medical Language System (UMLS). The objective of this work is to develop a method that can disambiguate terms in biomedical text by exploiting similarity and relatedness information extracted from biomedical resources and to evaluate the efficacy of these measure on WSD. Method: We evaluate our method on a biomedical dataset (MSH-WSD) that contains 203 ambiguous terms and acronyms. Results: We show that information content-based measures derived from either a corpus or taxonomy obtain a higher disambiguation accuracy than path-based measures or relatedness measures on the MSH-WSD dataset. Availability: The WSD system is open source and freely available from http://search.cpan.org/dist/UMLS-SenseRelate/. The MSH-WSD dataset is available from the National Library of Medicine http://wsd.nlm.nih.gov. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

10. An Affix Based Word Classification Method of Assamese Text.

Author: Sarma, Bhairab and Purkayastha, Bipul Shyam
Subjects: NATURAL language processing, COMPUTATIONAL linguistics, SEMANTIC computing, AFFIXES (Grammar), ASSAMESE language
Abstract: Classification of word is an important activity in Natural Language Processing (NLP) analysis. Word classification as we mean in linguistic is not same as in natural language processing. In NLP, the main objective is Part-of-Speech tagging (POST) which if essential for machine translation and language interpretation. However, in linguistic, words are classified as their applications and representation of meaning in the context of real world. Retrieving contextual meaning in language processing is a very challenging job. Because of sense disambiguation, representation ambiguity and words with multiple meaning, the task POST become very difficult. Assamese is a highly inflected and morphologically rich Indian language. In this study, we attempt to classify words based on its morphological structure. We present a method of classification of Assamese word based on its inflectional features. The classes we have used here may not be complement with POS classification. However it could be method of word clustering during POS with application of other smoothing algorithm like HMM, EM etc. We believe that this method can further be implementing into any other inflectional Indian language processing. [ABSTRACT FROM AUTHOR]
Published: 2013

11. Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus.

Author: Stevenson, Mark and Guo, Yikun
Abstract: Abstract: Researchers have access to a vast amount of information stored in textual documents and there is a pressing need for the development of automated methods to enable and improve access to this resource. Lexical ambiguity, the phenomena in which a word or phrase has more than one possible meaning, presents a significant obstacle to automated text processing. Word Sense Disambiguation (WSD) is a technology that resolves these ambiguities automatically and is an important stage in text understanding. The most accurate approaches to WSD rely on manually labeled examples but this is usually not available and is prohibitively expensive to create. This paper offers a solution to that problem by using information in the UMLS Metathesaurus to automatically generate labeled examples. Two approaches are presented. The first is an extension of existing work (Liu et al., 2002 ) and the second a novel approach that exploits information in the UMLS that has not been used for this purpose. The automatically generated examples are evaluated by comparing them against the manually labeled ones in the NLM-WSD data set and are found to outperform the baseline. The examples generated using the novel approach produce an improvement in WSD performance when combined with manually labeled examples. [Copyright &y& Elsevier]
Published: 2010
Full Text: View/download PDF

12. Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources

Author: Marlena Orlinska, Maciej Piasecki, and Paweł Kędzia
Subjects: Linguistics and Language, Computer Networks and Communications, Computer science, WordNet, Scale (descriptive set theory), Ontology (information science), computer.software_genre, Lexicon, lcsh:P325-325.5, WSD, page rank, Structure (mathematical logic), graphs, business.industry, Communication, lexical resources, lcsh:P98-98.5, Part of speech, plWordNet, lcsh:Lexicography, word sense disambiguation, SUMO, Artificial intelligence, lcsh:Computational linguistics. Natural language processing, business, computer, lcsh:P327-327.5, Natural language processing, Natural language, Word (computer architecture), lcsh:Semantics
Abstract: Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical ResourcesLexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed.
Published: 2015

13. All-words word sense disambiguation for Turkish

Author: Begüm Avar, Ali Bugra Kanburoglu, Olcay Taner Yildiz, Berke Ozenc, Ozan Topsakal, Ali Tunca Gurkan, Ilker Cam, Gokhan Ercan, Burak Ertopcu, Onur Acikgoz, Işık Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Işık University, Faculty of Engineering, Department of Computer Engineering, Açıkgöz, Onur, Gürkan, Ali Tunca, Ertopçu, Burak, Topsakal, Ozan, Özenç, Berke, Kanburoğlu, Ali Buğra, Çam, İlker, Ercan, Gökhan, and Yıldız, Olcay Taner
Subjects: Distributed databases, Computer science, Turkish, KNN, Türkçe Penn-Treebank corpus, Feature extraction, Morphologically rich language, Treebank, Word sense disambiguation, Context (language use), Feature extraction methods, Turkish language, computer.software_genre, Semantics, Predictive models, Tools, C4.5, Naive Bayes classifier, Multilayer perceptron, WSD, Multilayer perceptrons, Turkish penn-treebank corpus, Rocchio classification, Pragmatics, business.industry, Natural language processing, Kelime belirsizlik giderme, Random processes, Pattern classification, Random forests, SemEval, language.human_language, English language, ComputingMethodologies_PATTERNRECOGNITION, Computer bugs, Assignment problem, language, Penn treebank corpus, Artificial intelligence, Syntactics, Naive bayes, business, computer, Word (computer architecture)
Abstract: Identifying the sense of a word within a context is a challenging problem and has many applications in natural language processing. This assignment problem is called word sense disambiguation(WSD). Many papers in the literature focus on English language and data. Our dataset consists of 1400 sentences translated to Turkish from the Penn Treebank Corpus. This paper seeks to address and discuss 6 different feature extraction methods and its classification performances using C4.5, Random Forests, Rocchio, Naive Bayes, KNN, Linear and multilayer Perceptron. This paper calls into question how the described features perform on a morphologically rich language (Turkish) with several classifiers. Bir kelimenin geçtiği bağlam içindeki anlamını belirlemek , doğal dil işleme alanında, zorlu ve çokça uygulaması olan bir problemdir. Bu problemin literatürdeki bilinen adı, kelime belirsizlik gidermedir. Bir çok yayın İngiliz dili ve verileri üzerine yoğunlaşmış çalışmalardır. Bu çalışmada kullandığımız veri kümesi, Penn Treebank Corpus'dan derlenmiş ve Türkçe'ye çevrilmiş 1400 cümleden oluşmaktadır. Çalışmamızın amacı 6 farklı öznitelik çıkarım algoritmasının performanslarını farklı sınıflandırıcılarla ölçmektir. Kullandığımız sınıflandırma algoritmaları, C4.5, Random Forests, Rocchio, Naive Bayes, KNN, Linear ve multilayer Perceptron'dır. yayınımızın amacı açıklanan özniteliklerin morfolojik açıdan zengin olan bir dilde (Türkçe), farklı sınıflandırıcılarla verdiği performans ölçmektir. Publisher's Version
Published: 2017
Full Text: View/download PDF

14. Train-O-Matic: large-scale supervised Word Sense Disambiguation in multiple languages without manual training data

Author: Roberto Navigli and Tommaso Pasini
Subjects: Vocabulary, Training set, Word-sense disambiguation, Computer science, business.industry, media_common.quotation_subject, Scale (chemistry), WSD, Word Sense Disambiguation, Knowledge Acquisition Bottleneck, 02 engineering and technology, computer.software_genre, SemEval, Resource (project management), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing, media_common
Abstract: Annotating large numbers of sentences with senses is the heaviest requirement of current Word Sense Disambiguation. We present Train-O-Matic, a language-independent method for generating millions of sense-annotated training instances for virtually all meanings of words in a language’s vocabulary. The approach is fully automatic: no human intervention is required and the only type of human knowledge used is a WordNet-like resource. Train-O-Matic achieves consistently state-of-the-art performance across gold standard datasets and languages, while at the same time removing the burden of manual annotation. All the training data is available for research purposes at http://trainomatic.org.
Published: 2017

15. Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text

Author: Bridget T. McInnes and Ted Pedersen
Subjects: Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Word sense disambiguation, Semantic similarity and relatedness, Health Informatics, 02 engineering and technology, computer.software_genre, NLP, Article, Semantic similarity, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Degree of similarity, WSD, Information retrieval, Word-sense disambiguation, National library, business.industry, Natural language processing, Unified Medical Language System, Biomedical documents, Semantics, 3. Good health, Computer Science Applications, Open source, Evaluation Studies as Topic, Biomedical text, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Display Omitted HighlightsObjective: to develop a method that can disambiguate terms in biomedical text.WSD method exploits similarity and relatedness information extracted from the UMLS.Evaluates the efficacy of similarity and relatedness measures on WSD.IC-based measures obtains higher accuracy than path-based and relatedness measures. IntroductionIn this article, we evaluate a knowledge-based word sense disambiguation method that determines the intended concept associated with an ambiguous word in biomedical text using semantic similarity and relatedness measures. These measures quantify the degree of similarity or relatedness between concepts in the Unified Medical Language System (UMLS). The objective of this work is to develop a method that can disambiguate terms in biomedical text by exploiting similarity and relatedness information extracted from biomedical resources and to evaluate the efficacy of these measure on WSD. MethodWe evaluate our method on a biomedical dataset (MSH-WSD) that contains 203 ambiguous terms and acronyms. ResultsWe show that information content-based measures derived from either a corpus or taxonomy obtain a higher disambiguation accuracy than path-based measures or relatedness measures on the MSH-WSD dataset. AvailabilityThe WSD system is open source and freely available from http://search.cpan.org/dist/UMLS-SenseRelate/. The MSH-WSD dataset is available from the National Library of Medicine http://wsd.nlm.nih.gov.
Published: 2013
Full Text: View/download PDF

16. Word Sense Disambiguation using Aggregated Similarity based on WordNet Graph Representation

Author: MÄƒdÄƒlina Zurini
Subjects: lcsh:Computer engineering. Computer hardware, Computer science, WordNet, WSD, Similarity Measure, WordNet, Ontology, Synset, lcsh:TK7885-7895, Similarity measure, Ontology (information science), computer.software_genre, Synset, Taxonomy (general), Similarity (psychology), WSD, Polysemy, Ontology, business.industry, lcsh:Z, lcsh:Bibliography. Library science. Information resources, Similarity Measure, ComputingMethodologies_PATTERNRECOGNITION, Knowledge base, Graph (abstract data type), Artificial intelligence, business, computer, Natural language processing
Abstract: The term of word sense disambiguation, WSD, is introduced in the context of text document processing. A knowledge based approach is conducted using WordNet lexical ontology, describing its structure and components used for the process of identification of context related senses of each polysemy words. The principal distance measures using the graph associated to WordNet are presented, analyzing their advantages and disadvantages. A general model for aggregation of distances and probabilities is proposed and implemented in an application in order to detect the context senses of each word. For the non-existing words from WordNet, a similarity measure is used based on probabilities of co-occurrences. The module of WSD is proposed for integration in the step of processing documents such as supervised and unsupervised classification in order to maximize the correctness of the classification. Future work is related to the implementation of different domain oriented ontologies.Keywords: WSD, Similarity Measure, WordNet, Ontology, Synset(ProQuest: ... denotes formulae omitted.)1 IntroductionFor the acquisition of knowledge in artificial intelligence, two approaches defined in [1] are used:* transfer process between human to knowledge base, process with a major disadvantage given by the fact that the one who has knowledge cannot easily identify it;* conceptual modeling process by building models in which are placed the new knowledge as they are acquired, this process leading to the appearance of the ontology as a systematic organization of knowledge, data of the reality, leading to the construction of theories upon what it exists.An essential role of ontology is to be reused in multiple applications. Mapping two or more ontologies is called alignment. This task is particularly difficult, the main cause of limitation in extending existing ontologies [1].Direction that follows the ontology is supported by the introduction of artificial intelligence techniques to emulate the mental representation of concepts used, and the interpenetration of these links.The kernel of the ontology is defined as a system 0 = (£, T, C*,dC, ROOT), where:* £ is the lexicon formed out of the terms from the natural language;* C* a set of concepts;* T represents the reference function that maps the set of terms of the lexicon to the set of concepts;* H is the hierarchy of the taxonomy given by the direct, acyclic, transitive and reflexive relation;* ROOT is the starting point upon which the hierarchy is built on.There are two types of ontologies as defined in [1], depending on the area in which they are used:* ontologies for knowledge-based systems are characterized by a relatively small number of concepts, but linked by a large and varied relationships, concepts are grouped into complex conceptual schemes or scenarios and for each concept there can be one or more customizations;* lexicalized ontologies, including a large number of concepts linked by a small number of relationships, like WordNet ontology concepts that are represented by sets of synonymous words, these ontologies are used in human language processing systems.It is introduced the concept of ontology as a knowledge base in the classification of documents, in order to analyze semantic documents by solving the ambiguity of the terms.This integration results in an improvement in the objective function defined for classification techniques used. The main components of an ontology are described, the concepts and relations between them. These components are analyzed, identifying methods of extracting knowledge from within.With the defined relationships between concepts it is created the graph representation seen as a taxonomy of belonging such as "is- a" of the concepts to the more general ones. The senses of a concept are defined, along with the possibility of graph representation of each sense. …
Published: 2013
Full Text: View/download PDF

17. Word sense disambiguation

Author: Roberto Navigli
Subjects: General Computer Science, Computer science, media_common.quotation_subject, algorithms, experimentation, lexical ambiguity, lexical semantics, measurement, performance, semantic annotation, sense annotation, word sense disambiguation, word sense discrimination, wsd, Context (language use), computer.software_genre, Semantics, Theoretical Computer Science, Task (project management), Polysemy, media_common, business.industry, Ambiguity, SemEval, Knowledge base, Artificial intelligence, business, computer, Natural language processing, Meaning (linguistics)
Abstract: Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the motivations for solving the ambiguity of words and provide a description of the task. We overview supervised, unsupervised, and knowledge-based approaches. The assessment of WSD systems is discussed in the context of the Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating in several different disambiguation tasks. Finally, applications, open problems, and future directions are discussed.
Published: 2009
Full Text: View/download PDF

18. WSD-games: a Game-Theoretic Algorithm for Unsupervised Word Sense Disambiguation

Author: Rocco Tripodi and Marcello Pelillo
Subjects: Computer Science::Computer Science and Game Theory, Word-sense disambiguation, Settore INF/01 - Informatica, Game theoretic, business.industry, Computer science, Evolutionary game theory, Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing), computer.software_genre, NLP, Game Theory, WSD,NLP,Game Theory, Graph (abstract data type), Multiplayer game, Artificial intelligence, WSD, business, Algorithm, computer, Computer Science::Formal Languages and Automata Theory, Natural language processing
Abstract: In this paper we present an unsupervised approach to word sense disambiguation based on evolutionary game theory. In our algorithm each word to be disambiguated is represented as a node on a graph and each sense as a class. The algorithm performs a consistent class assignment of senses according to the similarity information of each word with the others, so that similar words are constrained to similar classes. The dynamics of the system are formulated in terms of a non-cooperative multiplayer game, where the players are the data points to decide their class memberships and equilibria correspond to consistent labeling of the data.
Published: 2015
Full Text: View/download PDF

19. Désambiguisation de sens par modèles de contextes et son application à la Recherche d’Information

Author: Brosseau-Villeneuve, Bernard and Nie, Jian-Yun
Subjects: Word context, RI, Recherche d'Information, Context models, Informatique, Traitement des langues naturelles, NLP, TAL, Computer Science, Information Retrieval, Désambiguisation de sens, IR, Contexte de mots, Modèles de contexte, WSD, Word Sense Disambiguation, DS, Natural Language Processing
Abstract: Il est connu que les problèmes d'ambiguïté de la langue ont un effet néfaste sur les résultats des systèmes de Recherche d'Information (RI). Toutefois, les efforts de recherche visant à intégrer des techniques de Désambiguisation de Sens (DS) à la RI n'ont pas porté fruit. La plupart des études sur le sujet obtiennent effectivement des résultats négatifs ou peu convaincants. De plus, des investigations basées sur l'ajout d'ambiguïté artificielle concluent qu'il faudrait une très haute précision de désambiguation pour arriver à un effet positif. Ce mémoire vise à développer de nouvelles approches plus performantes et efficaces, se concentrant sur l'utilisation de statistiques de cooccurrence afin de construire des modèles de contexte. Ces modèles pourront ensuite servir à effectuer une discrimination de sens entre une requête et les documents d'une collection. Dans ce mémoire à deux parties, nous ferons tout d'abord une investigation de la force de la relation entre un mot et les mots présents dans son contexte, proposant une méthode d'apprentissage du poids d'un mot de contexte en fonction de sa distance du mot modélisé dans le document. Cette méthode repose sur l'idée que des modèles de contextes faits à partir d'échantillons aléatoires de mots en contexte devraient être similaires. Des expériences en anglais et en japonais montrent que la force de relation en fonction de la distance suit généralement une loi de puissance négative. Les poids résultant des expériences sont ensuite utilisés dans la construction de systèmes de DS Bayes Naïfs. Des évaluations de ces systèmes sur les données de l'atelier Semeval en anglais pour la tâche Semeval-2007 English Lexical Sample, puis en japonais pour la tâche Semeval-2010 Japanese WSD, montrent que les systèmes ont des résultats comparables à l'état de l'art, bien qu'ils soient bien plus légers, et ne dépendent pas d'outils ou de ressources linguistiques. La deuxième partie de ce mémoire vise à adapter les méthodes développées à des applications de Recherche d'Information. Ces applications ont la difficulté additionnelle de ne pas pouvoir dépendre de données créées manuellement. Nous proposons donc des modèles de contextes à variables latentes basés sur l'Allocation Dirichlet Latente (LDA). Ceux-ci seront combinés à la méthodes de vraisemblance de requête par modèles de langue. En évaluant le système résultant sur trois collections de la conférence TREC (Text REtrieval Conference), nous observons une amélioration proportionnelle moyenne de 12% du MAP et 23% du GMAP. Les gains se font surtout sur les requêtes difficiles, augmentant la stabilité des résultats. Ces expériences seraient la première application positive de techniques de DS sur des tâches de RI standard., It is known that the ambiguity present in natural language has a negative effect on Information Retrieval (IR) systems effectiveness. However, up to now, the efforts made to integrate Word Sense Disambiguation (WSD) techniques in IR systems have not been successful. Past studies end up with either poor or unconvincing results. Furthermore, investigations based on the addition of artificial ambiguity shows that a very high disambiguation accuracy would be needed in order to observe gains. This thesis has for objective to develop efficient and effective approaches for WSD, using co-occurrence statistics in order to build context models. Such models could then be used in order to do a word sense discrimination between a query and documents of a collection. In this two-part thesis, we will start by investigating the principle of strength of relation between a word and the words present in its context, proposing an approach to learn a function mapping word distance to count weights. This method is based on the idea that context models made from random samples of word in context should be similar. Experiments in English and Japanese shows that the strength of relation roughly follows a negative power law. The weights resulting from the experiments are then used in the construction of Naïve Bayes WSD systems. Evaluations of these systems in English with the Semeval-2007 English Lexical Sample (ELS), and then in Japanese with the Semeval-2010 Japanese WSD (JWSD) tasks shows that the systems have state-of-the-art accuracy even though they are much lighter and don't rely on linguistic tools or resources. The second part of this thesis aims to adapt the new methods to IR applications. Such applications put heavy constraints on performance and available resources. We thus propose the use of corpus-based latent context models based on Latent Dirichlet Allocation (LDA). The models are combined with the query likelihood Language Model (LM) approach for IR. Evaluating the systems on three collections from the Text REtrieval Conference (TREC), we observe average proportional improvement in the range of 12% in MAP and 23% in GMAP. We then observe that the gains are mostly made on hard queries, augmenting the robustness of the results. To our knowledge, these experiments are the first positive application of WSD techniques on standard IR tasks.
Published: 2011

20. On the portability and tuning of supervised word sense disambiguation systems

Author: Escudero Bakx, Gerard|||0000-0002-4914-1686, Màrquez Villodre, Lluís, Rigau Claramunt, German, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
Subjects: ComputingMethodologies_PATTERNRECOGNITION, Informàtica [Àrees temàtiques de la UPC], LazyBoosting algorithm, Natural language processing, Machine learning, Word sense disambiguation, Portability and tuning of NLP systems, WSD
Abstract: This report describes a set of experiments carried out to explore the portability of alternative supervised Word Sense Disambiguation algorithms. The aim of the work is threefold: firstly, studying the performance of these algorithms when tested on a different corpus from that they were trained on; secondly, exploring their ability to tune to new domains, and thirdly, demonstrating empirically that the LazyBoosting algorithm outperforms state-of-the-art supervised WSD algorithms in both previous situations.

21. Machine learning and natural language processing

Author: Màrquez Villodre, Lluís, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
Subjects: Informàtica [Àrees temàtiques de la UPC], Natural language processing, Machine learning, Word sense disambiguation, WSD, ML, NLP, Supervised learning
Abstract: In this report, some collaborative work between the fields of Machine Learning (ML) and Natural Language Processing (NLP) is presented. The document is structured in two parts. The first part includes a superficial but comprehensive survey covering the state-of-the-art of machine learning techniques applied to natural language learning tasks. In the second part, a particular problem, namely Word Sense Disambiguation (WSD), is studied in more detail. In doing so, four algorithms for supervised learning, which belong to different families, are compared in a benchmark corpus for the WSD task. Both qualitative and quantitative conclusions are drawn.

22. IR like a SIR: Sense-enhanced Information Retrieval for Multiple Languages

Author: Rexhina Blloshmi, Tommaso Pasini, Niccolò Campolungo, Somnath Banerjee, Roberto Navigli, and Gabriella Pasi
Subjects: sense-enhanced, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, 02 engineering and technology, 010501 environmental sciences, nlp, 01 natural sciences, natural language processing, information retrieval, ir, word sense disambiguation, wsd, multilinguality, strategies, tools, standards for lexicographic resources (objective 3), WP3, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 0105 earth and related environmental sciences
Abstract: With the advent of contextualized embeddings, attention towards neural ranking approaches for Information Retrieval increased considerably. However, two aspects have remained largely neglected: i) queries usually consist of few keywords only, which increases ambiguity and makes their contextualization harder, and ii) performing neural ranking on non-English documents is still cumbersome due to shortage of labeled datasets. In this paper we present SIR (Sense-enhanced Information Retrieval) to mitigate both problems by leveraging word sense information. At the core of our approach lies a novel multilingual query expansion mechanism based on Word Sense Disambiguation that provides sense definitions as additional semantic information for the query. Importantly, we use senses as a bridge across languages, thus allowing our model to perform considerably better than its supervised and unsupervised alternatives across French, German, Italian and Spanish languages on several CLEF benchmarks, while being trained on English Robust04 data only. We release SIR at https://github.com/SapienzaNLP/sir.
Full Text: View/download PDF

23. Integrating Linguistic Resources in TC through WSD

Author: Ureña-López, L. Alfonso, Buenaga, Manuel, and Gómez, José M.
Published: 2001

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

23 results on '"WSD"'

1. Comparative Analysis of Decision Tree and k-NN to Solve WSD Problem in Kashmiri

2. Assamese Word Sense Disambiguation using Cuckoo Search Algorithm.

3. Word sense disambiguation based on stretchable matching of the semantic template.

4. An Integration Model of Semantic Annotation Based on Synergetic Neural Network.

5. Arabic Gloss WSD Using BERT

6. Exemplification Modeling: Can You Give Me an Example, Please?

7. ConSeC: Word Sense Disambiguation as Continuous Sense Comprehension

8. Determining the difficulty of Word Sense Disambiguation.

9. Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text.

10. An Affix Based Word Classification Method of Assamese Text.

11. Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus.

12. Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources

13. All-words word sense disambiguation for Turkish

14. Train-O-Matic: large-scale supervised Word Sense Disambiguation in multiple languages without manual training data

15. Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text

16. Word Sense Disambiguation using Aggregated Similarity based on WordNet Graph Representation

17. Word sense disambiguation

18. WSD-games: a Game-Theoretic Algorithm for Unsupervised Word Sense Disambiguation

19. Désambiguisation de sens par modèles de contextes et son application à la Recherche d’Information

20. On the portability and tuning of supervised word sense disambiguation systems

21. Machine learning and natural language processing

22. IR like a SIR: Sense-enhanced Information Retrieval for Multiple Languages

23. Integrating Linguistic Resources in TC through WSD

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

23 results on '"WSD"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources