1,835 results on '"Word sense disambiguation"'
Search Results
2. Improving clinical abbreviation sense disambiguation using attention‐based Bi‐LSTM and hybrid balancing techniques in imbalanced datasets.
- Author
-
Hosseini, Manda, Rasekh, Amir Hossein, and Keshavarzi, Amin
- Subjects
- *
DATA mining , *STRUCTURAL models , *CLINICAL decision support systems , *NATURAL language processing , *UNCERTAINTY , *DESCRIPTIVE statistics , *DEEP learning , *ELECTRONIC health records , *ABBREVIATIONS , *COMPARATIVE studies , *ALGORITHMS - Abstract
Rationale: Clinical abbreviations pose a challenge for clinical decision support systems due to their ambiguity. Additionally, clinical datasets often suffer from class imbalance, hindering the classification of such data. This imbalance leads to classifiers with low accuracy and high error rates. Traditional feature‐engineered models struggle with this task, and class imbalance is a known factor that reduces the performance of neural network techniques. Aims and Objectives: This study proposes an attention‐based bidirectional long short‐term memory (Bi‐LSTM) model to improve clinical abbreviation disambiguation in clinical documents. We aim to address the challenges of limited training data and class imbalance by employing data generation techniques like reverse substitution and data augmentation with synonym substitution. Method: We utilise a Bi‐LSTM classification model with an attention mechanism to disambiguate each abbreviation. The model's performance is evaluated based on accuracy for each abbreviation. To address the limitations of imbalanced data, we employ data generation techniques to create a more balanced dataset. Results: The evaluation results demonstrate that our data balancing technique significantly improves the model's accuracy by 2.08%. Furthermore, the proposed attention‐based Bi‐LSTM model achieves an accuracy of 96.09% on the UMN dataset, outperforming state‐of‐the‐art results. Conclusion: Deep neural network methods, particularly Bi‐LSTM, offer promising alternatives to traditional feature‐engineered models for clinical abbreviation disambiguation. By employing data generation techniques, we can address the challenges posed by limited‐resource and imbalanced clinical datasets. This approach leads to a significant improvement in model accuracy for clinical abbreviation disambiguation tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Enhanced unsupervised neural machine translation by cross lingual sense embedding and filtered back-translation for morphological and endangered Indic languages.
- Author
-
Chauhan, Shweta, Saxena, Shefali, and Daniel, Philemon
- Subjects
- *
MACHINE translating , *TELECOMMUNICATION , *COMMUNICATION of technical information , *LANGUAGE & languages , *PROBLEM solving - Abstract
The fast growth of communication technology has brought nations and their cultures closer together, and the demand for cross-language communication has risen tremendously. There is a different learning method to connect the source language to the target language in which unsupervised learning is a blessing for low-resource languages. The unsupervised machine translation is always problematic to those languages which are morphologically rich and low resources languages. Morphologically rich and low-resource language does not provide good results in machine translation if the translation is from morphologically less complex language to morphologically more complex languages. In this paper, we have improved the unsupervised neural machine translation by tackling the ambiguity problem and the quality of pseudo-parallel sentence pairs generated through back-translation for morphologically rich languages. The ambiguity problem is solved by taking the cross-lingual sense embedding at the source side instead of cross-lingual word embedding. By giving more weight to better pseudo-parallel sentence pairs in the back-translation step, the quality of pseudo-parallel sentences is increased. Different evaluation metrics have been used to check the robustness of the model and compared with different baseline models. The experiment is performed on different morphologically rich languages English-Hindi, English-Tamil, English-Telegu, and one low-resource endangered kangri language. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Word Sense Disambiguation for Morphologically Rich Low-Resourced Languages: A Systematic Literature Review and Meta-Analysis.
- Author
-
Masethe, Hlaudi Daniel, Masethe, Mosima Anna, Ojo, Sunday Olusegun, Giunchiglia, Fausto, and Owolawi, Pius Adewale
- Subjects
- *
NATURAL language processing , *DATABASE searching , *DATA modeling , *HETEROGENEITY , *META-analysis - Abstract
In natural language processing, word sense disambiguation (WSD) continues to be a major difficulty, especially for low-resource languages where linguistic variation and a lack of data make model training and evaluation more difficult. The goal of this comprehensive review and meta-analysis of the literature is to summarize the body of knowledge regarding WSD techniques for low-resource languages, emphasizing the advantages and disadvantages of different strategies. A thorough search of several databases for relevant literature produced articles assessing WSD methods in low-resource languages. Effect sizes and performance measures were extracted from a subset of trials through analysis. Heterogeneity was evaluated using pooled effect and estimates were computed by meta-analysis. The preferred reporting elements for systematic reviews and meta-analyses (PRISMA) were used to develop the process for choosing the relevant papers for extraction. The meta-analysis included 32 studies, encompassing a range of WSD methods and low-resourced languages. The overall pooled effect size indicated moderate effectiveness of WSD techniques. Heterogeneity among studies was high, with an I2 value of 82.29%, suggesting substantial variability in WSD performance across different studies. The (τ2) tau value of 5.819 further reflects the extent of between-study variance. This variability underscores the challenges in generalizing findings and highlights the influence of diverse factors such as language-specific characteristics, dataset quality, and methodological differences. The p-values from the meta-regression (0.454) and the meta-analysis (0.440) suggest that the variability in WSD performance is not statistically significantly associated with the investigated moderators, indicating that the performance differences may be influenced by factors not fully captured in the current analysis. The absence of significant p-values raises the possibility that the problems presented by low-resource situations are not yet well addressed by the models and techniques in use. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Naïve Bayes classifier for Kashmiri word sense disambiguation.
- Author
-
Mir, Tawseef Ahmad and Lawaye, Aadil Ahmad
- Abstract
Many applications of Natural Language Processing (NLP) like machine translation, document clustering, and information retrieval make use of Word Sense Disambiguation (WSD). WSD automatically predicts the sense of an ambiguous word that exactly fits it as per the given situation. While it may seem very easy for humans to interpret the meaning of natural language, machines require the processing of huge amounts of data for similar tasks. In this paper, we propose an automatic WSD system for the Kashmiri language based on the Naive Bayes classifier. This work is the first attempt towards developing a WSD system for the Kashmiri language to the best of our knowledge. Bag-of-Words (BoW) and Part-of-Speech (PoS) based features are used in this study for developing the WSD system. Experiments are carried out on a manually crafted sense-tagged dataset for 60 ambiguous Kashmiri words. These 60 words are selected based on the frequency in the raw corpus collected. Senses for annotation purposes of these ambiguous words are extracted from Kashmiri WordNet. The performance of the proposed system is measured using accuracy, precision, recall and F-1 measure metrics. The proposed WSD model reported the best performance (accuracy = 89.92, precision = 0.84, recall = 0.89, F-1 measure = 0.86) when both PoS and BoW features were used at the same time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. System Fusion Based on WordNet Word Sense Disambiguation.
- Author
-
Duan, Mengtao and Luan, Tingyan
- Subjects
EXPERIMENTAL groups ,CONTROL groups - Abstract
In the realm of natural language processing (NLP), Word Sense Disambiguation (WSD) is a crucial task, and WSD systems are used in many NLP applications. Systems built on WordNet (e.g., Lesk) have been witnessing encouraging progress in the domain of word sense disambiguation. Yet, the performance of WordNet based WSD systems may have limits when disambiguating polysemous words. The purpose of this research was to investigate the discrepancies between a systematic fusing approach of WordNet WSD and a single best performing system. In the experimental test, the fusing approach was used to disambiguate, and in the control test, a single best disambiguating system was used to disambiguate. The accuracies, recalls, and disambiguation times of two groups were compared after the two groups were tested by the same test dataset. The result of the experiment is that the performance accuracy and recall of the experimental group is better than that of the control group. The decision result of the multiple systems was fused to strengthen the performance accuracy and comprehensive of the system. At the disambiguation time, the experimental group showed a worthy disambiguation rate of disambiguation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. SinaTools: Open Source Toolkit for Arabic Natural Language Processing.
- Author
-
Hammouda, Tymaa, Jarrar, Mustafa, and Khalilia, Mohammed
- Subjects
NATURAL language processing ,PYTHON programming language ,ARABIC language ,WORKFLOW ,MORPHOLOGY - Abstract
We introduce SinaTools, an open-source Python package for Arabic natural language processing and understanding. SinaTools is a unified package allowing people to integrate it into their system workflow, offering solutions for various tasks such as flat and nested Named Entity Recognition (NER), fully-flagged Word Sense Disambiguation (WSD), Semantic Relatedness, Synonymy Extractions and Evaluation, Lemmatization, Part-of-speech Tagging, Root Tagging, and additional helper utilities such as corpus processing, text stripping methods, and diacritic-aware word matching. This paper presents SinaTools and its benchmarking results, demonstrating that SinaTools outperforms all similar tools on the aforementioned tasks, such as Flat NER (87.33%), Nested NER (89.42%), WSD (82.63%), Semantic Relatedness (0.49 Spearman rank), Lemmatization (90.5%), POS tagging (93.8%), among others. SinaTools can be downloaded from (https://sina.birzeit.edu/sinatools). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. ‘What’s the word? That’s the word!’: linguistic features of Filipino queer language
- Author
-
Mark Bedoya Ulla, Jonathan Marcos Macaraeg, and Renz E. Ferrera
- Subjects
Filipino queer individuals ,LGBTQ + community ,queer language ,semantic properties ,word sense disambiguation ,Jeroen van de Weijer, College of International Studies, Shenzhen University, China ,Fine Arts ,Arts in general ,NX1-820 ,General Works ,History of scholarship and learning. The humanities ,AZ20-999 - Abstract
AbstractUnderstanding queer language, which includes knowing its meaning and its applications, is essential not only for understanding sexuality and fostering language inclusivity but also for acknowledging the multifaceted uses of language across all societal groups. This study explored the semantic nuances and underlying patterns of Filipino queer language using Word Sense Disambiguation (WSD). Using content analysis on the data from publicly available posts and comments on Facebook and Twitter, the findings uncovered distinct characteristics of Filipino queer language, encompassing affixation, appropriation, clipping, association, mutation, neologism, recontextualization, and stylized reversal with affixation. The findings underscore the vibrancy and adaptability of Filipino queer language, reflecting the LGBTQIA + community’s ability to use language as a tool for self-expression and cultural relevance. Additionally, the findings highlight the significance of understanding queer language not only as an identity expression for LGBTQIA + individuals but also as a means to promote linguistic inclusivity for all groups in society. We discuss the implications, and we offer recommendations.
- Published
- 2024
- Full Text
- View/download PDF
9. Models and Strategies for Russian Word Sense Disambiguation: A Comparative Analysis
- Author
-
Aleksandrova, Anastasiia, Nivre, Joakim, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nöth, Elmar, editor, Horák, Aleš, editor, and Sojka, Petr, editor
- Published
- 2024
- Full Text
- View/download PDF
10. WSD-Based Bangla Cyberbullying Detection Using Transform Learning
- Author
-
Majumder, Amit, Kumar, Amit, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Hassanien, Aboul Ella, editor, Anand, Sameer, editor, Jaiswal, Ajay, editor, and Kumar, Prabhat, editor
- Published
- 2024
- Full Text
- View/download PDF
11. Enhancing Word Sense Disambiguation Performance on WiC-TSV Dataset Using BERT-LSTM Model
- Author
-
Jain, Priya, Saritha, Sri Khetwat, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Pant, Millie, editor, Deep, Kusum, editor, and Nagar, Atulya, editor
- Published
- 2024
- Full Text
- View/download PDF
12. State-of-the-Art Approaches to Word Sense Disambiguation: A Multilingual Investigation
- Author
-
Habtamu, Robbel, Gizachew, Beakal, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Debelee, Taye Girma, editor, Ibenthal, Achim, editor, Schwenker, Friedhelm, editor, and Megersa Ayano, Yehualashet, editor
- Published
- 2024
- Full Text
- View/download PDF
13. Revisión de métodos para la desambiguación léxica automática: aprendizaje automático y medidas de relación y similitud semánticas.
- Author
-
Núñez Torres, Fredy and Pérez Cabello de Alba, María Beatriz
- Subjects
- *
NATURAL language processing , *MACHINE learning , *COMPUTATIONAL linguistics , *AXIOMS - Abstract
Among the possible solutions for automatic lexical disambiguation in natural language processing tasks, we find methods based on machine learning algorithms, semantic relatedness, and semantic similarity measures. While machine learning methods use endogenous sources of knowledge, semantic relatedness and similarity measures resort to exogenous sources of knowledge, such as definitions from lexicographic resources or lexical meaning relations from ontologies or thesauri, which offer a conceptual hierarchy. In this work, we present and analyze the different types of methods for automatic lexical disambiguation divided into four groups: based on machine learning algorithms, based on semantic relatedness measures, based on semantic similarity measures, and based on hybrid measures. We postulate that the advantage of methods based on relationship and similarity measures lies in the fact that their results are derived from statistical efficiency and linguistic knowledge found in the parameters that make up each of the measures used. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Modified lesk algorithm for word sense disambiguation in Bengali.
- Author
-
Das, Ratul, Pal, Alok Ranjan, and Saha, Diganta
- Abstract
This article presents a novel approach towards solving the problem of Word Sense Disambiguation (WSD) for Bengali Text. The algorithm used in this work is a modification of Lesk Algorithm. In the original algorithm, the overlap between the “context bag” and the “sense bag” items from the lexical resource (WordNet) are calculated using word pair matching. In the current approach the overlap is calculated by adopting semantic similarity measure using the fastText subword embeddings. The approach can efficiently handle unknown wordforms and discover the latent semantics of words. Significant progress has been made in WSD for English and other European Languages. Indian languages like Bengali still pose a formidable challenge. The dataset used for the work is individual sentences from the Bengali Wikipedia which is a huge collection of Bengali text (96 K Webpages with 1700 K sentences), the Indo WordNet for Bengali language and Bengali Online Dictionary. The results of the experiments performed are promising. The target words which have semantically distinct synsets in the WordNet give a high F1 score. The F1 score achieved is 80% which is well over the baseline and shows significant improvement over the other knowledge-based approaches tried on low resource Indian languages. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. A comprehensive dataset for Arabic word sense disambiguation
- Author
-
Sanaa Kaddoura and Reem Nassar
- Subjects
Labelled data ,Word sense disambiguation ,Machine learning ,Deep learning ,GPT3.5 ,Natural language processing ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Science (General) ,Q1-390 - Abstract
This data paper introduces a comprehensive dataset tailored for word sense disambiguation tasks, explicitly focusing on a hundred polysemous words frequently employed in Modern Standard Arabic. The dataset encompasses a diverse set of senses for each word, ranging from 3 to 8, resulting in 367 unique senses. Each word sense is accompanied by contextual sentences comprising ten sentence examples that feature the polysemous word in various contexts. The data collection resulted in a dataset of 3670 samples. Significantly, the dataset is in Arabic, which is known for its rich morphology, complex syntax, and extensive polysemy. The data was meticulously collected from various web sources, spanning news, medicine, finance, and more domains. This inclusivity ensures the dataset's applicability across diverse fields, positioning it as a pivotal resource for Arabic Natural Language Processing (NLP) applications. The data collection timeframe spans from the first of April 2023 to the first of May 2023. The dataset provides comprehensive model learning by including all senses for a frequently used Arabic polysemous term, even rare senses that are infrequently used in real-world contexts, thereby mitigating biases. The dataset comprises synthetic sentences generated by GPT3.5-turbo, addressing instances where rare senses lack sufficient real-world data. The dataset collection process involved initial web scraping, followed by manual sorting to distinguish word senses, supplemented by thorough searches by a human expert to fill in missing contextual sentences. Finally, in instances where online data for rare word senses was lacking or insufficient, synthetic samples were generated. Beyond its primary utility in word sense disambiguation, this dataset holds considerable value for scientists and researchers across various domains, extending its relevance to sentiment analysis applications.
- Published
- 2024
- Full Text
- View/download PDF
16. Application of the transformer model algorithm in chinese word sense disambiguation: a case study in chinese language
- Author
-
Linlin Li, Juxing Li, Hongli Wang, and Jianing Nie
- Subjects
Transformer model algorithm ,Chinese language ,BiLSTM ,Word sense disambiguation ,Root mean squared error ,Medicine ,Science - Abstract
Abstract This study aims to explore the research methodology of applying the Transformer model algorithm to Chinese word sense disambiguation, seeking to resolve word sense ambiguity in the Chinese language. The study introduces deep learning and designs a Chinese word sense disambiguation model based on the fusion of the Transformer with the Bi-directional Long Short-Term Memory (BiLSTM) algorithm. By utilizing the self-attention mechanism of Transformer and the sequence modeling capability of BiLSTM, this model efficiently captures semantic information and context relationships in Chinese sentences, leading to accurate word sense disambiguation. The model’s evaluation is conducted using the PKU Paraphrase Bank, a Chinese text paraphrase dataset. The results demonstrate that the model achieves a precision rate of 83.71% in Chinese word sense disambiguation, significantly outperforming the Long Short-Term Memory algorithm. Additionally, the root mean squared error of this algorithm is less than 17, with a loss function value remaining around 0.14. Thus, this study validates that the constructed Transformer-fused BiLSTM-based Chinese word sense disambiguation model algorithm exhibits both high accuracy and robustness in identifying word senses in the Chinese language. The findings of this study provide valuable insights for advancing the intelligent development of word senses in Chinese language applications.
- Published
- 2024
- Full Text
- View/download PDF
17. Sense Unveiled: Enhancing Urdu Corpus for Nuanced Word Sense Disambiguation
- Author
-
Sarfraz Bibi, Sohail Asghar, and Muhammad Zubair
- Subjects
Word sense disambiguation ,natural language processing ,machine learning ,sense tagged Urdu corpus ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Ambiguity in word meanings presents a significant challenge in natural language processing, necessitating robust techniques for Word Sense Disambiguation (WSD). While research in WSD has predominantly focused on widely spoken languages like English and Spanish, less attention has been given to languages such as Urdu. This paper addresses this gap by conducting a thorough examination of existing corpora for WSD in Urdu and presenting the creation of an Enhanced Urdu (EU) corpus specifically tailored for WSD tasks. The analysis encompasses a critical evaluation of the limitations of ULS-WSD-18 Corpus, and justifies the need for a more comprehensive resource. The EU corpus is meticulously curated, comprising 960 words categorized based on their frequency in the corpus into most frequent, moderate, and infrequent words. These words serve as the foundation for constructing sentences utilized in model training and testing. Various similarity coefficients are employed to assess the similarity between the EU corpus and the ULS-WSD-18 Corpus, revealing notable patterns in word occurrences, sense structures, and sentence compositions. The findings underscore the potential of the EU corpus to advance WSD research in Urdu language processing. By providing a comprehensive resource for model development and evaluation, this work contributes to the broader goal of improving language processing tools for Urdu and other underrepresented languages.
- Published
- 2024
- Full Text
- View/download PDF
18. A Word Sense Disambiguation Method Applied to Natural Language Processing for the Portuguese Language
- Author
-
Clovis Holanda do Nascimento, Vinicius Cardoso Garcia, and Ricardo de Andrade Araujo
- Subjects
Artificial intelligence ,language models ,natural language processing ,word sense disambiguation ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Natural language processing (NLP) and artificial intelligence (AI) have advanced significantly in recent years, enabling the development of various tasks, such as machine translation, text summarization, sentiment analysis, and speech analysis. However, there are still challenges to overcome, such as natural language ambiguity. One of the problems caused by ambiguity is the difficulty of determining the proper meaning of a word in a specific context. For example, the word “mouse” can mean a computer peripheral or an animal, depending on the context. This limitation can lead to an incorrect semantic interpretation of the processed sentence. In recent years, language models (LMs) have provided a new impetus to NLP and AI, including in the task of word sense disambiguation (WSD). LMs are capable of learning and generating texts as they are trained on large amounts of data. However, in the Portuguese language, there are still few studies on WSD using LMs. Given this scenario, this article presents a method for WSD for the Portuguese language. To do this, it uses the BERTimbau language model, which is specific to the Portuguese. The results will be evaluated using the metrics established in the literature.
- Published
- 2024
- Full Text
- View/download PDF
19. Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection
- Author
-
Schlechtweg, Dominik, Zamora-Reina, Frank D., Bravo-Marquez, Felipe, and Arefyev, Nikolay
- Published
- 2024
- Full Text
- View/download PDF
20. Application of the transformer model algorithm in chinese word sense disambiguation: a case study in chinese language
- Author
-
Li, Linlin, Li, Juxing, Wang, Hongli, and Nie, Jianing
- Published
- 2024
- Full Text
- View/download PDF
21. Word sense disambiguation of acronyms in clinical narratives.
- Author
-
Chopard, Daphné, Corcoran, Padraig, and Spasić, Irena
- Subjects
MEDICAL information storage & retrieval systems ,DATA mining ,ACRONYMS ,NATURAL language processing ,MEDLINE ,INFORMATION retrieval ,MEDICAL subject headings ,ARTIFICIAL neural networks ,INFORMATION science ,SEMANTICS ,MACHINE learning ,ONLINE information services ,ALGORITHMS - Abstract
Clinical narratives commonly use acronyms without explicitly defining their long forms. This makes it difficult to automatically interpret their sense as acronyms tend to be highly ambiguous. Supervised learning approaches to their disambiguation in the clinical domain are hindered by issues associated with patient privacy and manual annotation, which limit the size and diversity of training data. In this study, we demonstrate how scientific abstracts can be utilised to overcome these issues by creating a large automatically annotated dataset of artificially simulated global acronyms. A neural network trained on such a dataset achieved the F1-score of 95% on disambiguation of acronym mentions in scientific abstracts. This network was integrated with multiword term recognition to extract a sense inventory of acronyms from a corpus of clinical narratives on the fly. Acronym sense extraction achieved the F1-score of 74% on a corpus of radiology reports. In clinical practice, the suggested approach can be used to facilitate development of institution-specific inventories. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Discrete Student Psychology Optimization Algorithm for the Word Sense Disambiguation Problem.
- Author
-
Haouassi, Hichem, Bekhouche, Abdelaali, Rahab, Hichem, Mahdaoui, Rafik, and Chouhal, Ouahiba
- Subjects
- *
OPTIMIZATION algorithms , *METAHEURISTIC algorithms , *NATURAL language processing , *PSYCHOLOGY students , *COMBINATORIAL optimization , *AMBIGUITY - Abstract
Word Sense Disambiguation (WSD) is a key step for many natural language processing tasks such as information search, automatic translation, and sentiment analysis. WSD is the process that identifies appropriate senses of ambiguous words in the text. With the increasing number of words to be disambiguated in large amount of text data, WSD becomes very challenging and that is why an exhaustive search for the best set of senses may be unpractical. Recently, several metaheuristic approaches have been proposed for different complex optimization problems and have achieved good results. Therefore, in order to improve the WSD process, in this paper, the WSD problem is modeled as a combinatorial optimization problem, and the Discrete Student Psychology-Based Optimization (DSPBO) metaheuristic is proposed and used to selecting appropriate senses. A DSPBO-based WSD is proposed to disambiguate more ambiguous words together in function to their contexts in the target text, and a Lesk-based fitness function is used to guide the DSPBO metaheuristic to optimize the general semantic similarity of selected senses. The proposed approach is evaluated and compared to several recent WSD approaches on the well-known corpuses SensEval-2, SensEval-3, SemEval-2007, SemEval-13, and SemEval-15. The comparison is made in terms of F-Measure, precision, and recall. Experiments show a significant improvement both over existing knowledge lexicon-based approaches and metaheuristic-based approaches, with a higher F-measure of 84.21%, 83.33%, 87.5%, 77.58%, and 81.08% on SensEval-2, SensEval-3, SemEval-2007, SemEval-13, and SemEval-15, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Lexeme connexion measure of cohesive lexical ambiguity revealing factor: a robust approach for word sense disambiguation of Bengali text.
- Author
-
Das Dawn, Debapratim, Khan, Abhinandan, Shaikh, Soharab Hossain, and Pal, Rajat Kumar
- Abstract
Word sense disambiguation (WSD) is the process of finding out the appropriate meaning of a polysemous word based on any given context. The Bengali language inherently comprises a large number of polysemous words. Recently, researchers in the domain of linguistics have been attracted to the problem of WSD in Bengali text due to its numerous interesting applications, viz. machine translation, opinion polarity identification, question-answering systems, etc. In this paper, lexeme connexion measure of cohesive lexical ambiguity revealing factor has been proposed that takes a decision on the disambiguation of senses of a Bengali polysemous word. All the polysemous words have been treated as target words, and a context window of three different sizes, viz. five, seven, and ten are considered based on these target words. This paper has generated lexeme harmony measure for quantifying heuristically of syntactic belongings of a collection of lexemes in Bengali text. The proposed methodology has been extracted a feature vector by considering the cohesive lexical ambiguity revealing factor or CLARF, depending on frame lexeme harmony (FLH), sense lexeme harmony (SLH), polysemy singularity coherence (PSC), polysemy distribution factor (PDF), and relative polysemy singularity coherence (RPSC) factor of a lexeme. This Bengali WSD technique has been applied max-rule of integrated lexeme connexion measure (LCM) of each lexeme of both the testing and training cases score for sense recognition. The proposed algorithm has succeeded in eliminating the drawback of the Bengali WSD approaches, as it can focus on both the lexical and semantic relationships between words. The performance of this algorithm has been evaluated on a dataset that consists of 100 polysemous words of three/four senses. Various evaluation metrics have been used to analyse the results obtained by the proposed algorithm. The obtained results indicate the robustness of the proposed algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Word Sense Disambiguation applied to Assamese-Hindi Bilingual Statistical Machine Translation.
- Author
-
Barman, Anup Kumar, Sarmah, Jumi, Basumatary, Subungshri, and Nag, Amitava
- Subjects
MACHINE translating ,NATURAL language processing ,NOUNS ,AMBIGUITY ,MACHINE learning ,DECISION trees - Abstract
Word Sense Disambiguation (WSD) is concerned with automatically assigning the appropriate sense to an ambiguous word. WSD is an important task and plays a crucial role in many Natural Language Processing (NLP) applications. A Statistical Machine Translation (SMT) system translates a source into a target language based on phrase-based statistical translation. MT plays a crucial role in a WSD system, as a source language word may be associated with multiple translations in the target language. This study aims to apply WSD to the input of the MT system to enhance the disambiguation output. Hindi WordNet was used by selecting the most frequent synonym to obtain the most accurate translation. This study also compared Naïve Bayes (NB) and Decision Tree (DT) to test and build a WSD model. NB was more appropriate for the WSD task than DT when evaluated in the Weka machine learning toolkit. To the best of our knowledge, no such work has been carried out yet for the Assamese Indo-Aryan language. The applied WSD achieved better results than the baseline MT system without embedding the WSD module. The results were analyzed by linguist scholars. Furthermore, the Assamese-Hindi transliteration system was merged with the baseline MT system for the translation of proper nouns. This study marks a remarkable contribution to Assamese NLP, which is a low computationally aware Indian language. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Word Sense Disambiguation for Morphologically Rich Low-Resourced Languages: A Systematic Literature Review and Meta-Analysis
- Author
-
Hlaudi Daniel Masethe, Mosima Anna Masethe, Sunday Olusegun Ojo, Fausto Giunchiglia, and Pius Adewale Owolawi
- Subjects
word sense disambiguation ,natural language processing ,low-resourced languages ,morphologically rich ,Information technology ,T58.5-58.64 - Abstract
In natural language processing, word sense disambiguation (WSD) continues to be a major difficulty, especially for low-resource languages where linguistic variation and a lack of data make model training and evaluation more difficult. The goal of this comprehensive review and meta-analysis of the literature is to summarize the body of knowledge regarding WSD techniques for low-resource languages, emphasizing the advantages and disadvantages of different strategies. A thorough search of several databases for relevant literature produced articles assessing WSD methods in low-resource languages. Effect sizes and performance measures were extracted from a subset of trials through analysis. Heterogeneity was evaluated using pooled effect and estimates were computed by meta-analysis. The preferred reporting elements for systematic reviews and meta-analyses (PRISMA) were used to develop the process for choosing the relevant papers for extraction. The meta-analysis included 32 studies, encompassing a range of WSD methods and low-resourced languages. The overall pooled effect size indicated moderate effectiveness of WSD techniques. Heterogeneity among studies was high, with an I2 value of 82.29%, suggesting substantial variability in WSD performance across different studies. The (τ2) tau value of 5.819 further reflects the extent of between-study variance. This variability underscores the challenges in generalizing findings and highlights the influence of diverse factors such as language-specific characteristics, dataset quality, and methodological differences. The p-values from the meta-regression (0.454) and the meta-analysis (0.440) suggest that the variability in WSD performance is not statistically significantly associated with the investigated moderators, indicating that the performance differences may be influenced by factors not fully captured in the current analysis. The absence of significant p-values raises the possibility that the problems presented by low-resource situations are not yet well addressed by the models and techniques in use.
- Published
- 2024
- Full Text
- View/download PDF
26. PKUSenseCor: A Large-Scale Word Sense Annotated Chinese Corpus
- Author
-
Jin, Peng, Wu, Yunfang, Zhu, Xuefeng, McCarthy, Diana, Qu, Weiguang, Yu, Shiwen, Ide, Nancy, Series Editor, Huang, Chu-Ren, editor, Hsieh, Shu-Kai, editor, and Jin, Peng, editor
- Published
- 2023
- Full Text
- View/download PDF
27. WSDTN a Novel Dataset for Arabic Word Sense Disambiguation
- Author
-
Saidi, Rakia, Jarray, Fethi, Akacha, Asma, Aribi, Wissem, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Nguyen, Ngoc Thanh, editor, Botzheim, János, editor, Gulyás, László, editor, Nunez, Manuel, editor, Treur, Jan, editor, Vossen, Gottfried, editor, and Kozierkiewicz, Adrianna, editor
- Published
- 2023
- Full Text
- View/download PDF
28. Removing Ambiguity in Natural Language for Generating Self-Join Queries
- Author
-
Sawant, Pradnya, Sonawane, Kavita, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rutkowski, Leszek, editor, Scherer, Rafał, editor, Korytkowski, Marcin, editor, Pedrycz, Witold, editor, Tadeusiewicz, Ryszard, editor, and Zurada, Jacek M., editor
- Published
- 2023
- Full Text
- View/download PDF
29. A Computational Approach for the Tonal Identification in Bodo Language
- Author
-
Narzary, Mwnthai, Brahma, Maharaj, Narzary, Sanjib, Senapati, Apurbalal, Singh, Pranav Kumar, Bhattacharjee, Ratnajit, editor, Neog, Debanga Raj, editor, Mopuri, Konda Reddy, editor, and Vipparthi, Santosh Kumar, editor
- Published
- 2023
- Full Text
- View/download PDF
30. Abbreviation Disambiguation: A Review of Modern Techniques to Improve Machine Reading Comprehension
- Author
-
Choi, Vince Sing, Taghva, Kazem, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2023
- Full Text
- View/download PDF
31. HSRG-WSD: A Novel Unsupervised Chinese Word Sense Disambiguation Method Based on Heterogeneous Sememe-Relation Graph
- Author
-
Lyu, Meng, Mo, Shasha, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Premaratne, Prashan, editor, Jin, Baohua, editor, Qu, Boyang, editor, Jo, Kang-Hyun, editor, and Hussain, Abir, editor
- Published
- 2023
- Full Text
- View/download PDF
32. User Similarity Computation Strategy for Collaborative Filtering Using Word Sense Disambiguation Technique
- Author
-
Samsuddoha, Md., Biswas, Dipto, Erfan, Md., Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Hossain, Md. Sazzad, editor, Majumder, Satya Prasad, editor, Siddique, Nazmul, editor, and Hossain, Md. Shahadat, editor
- Published
- 2023
- Full Text
- View/download PDF
33. Word Sense Disambiguation in the Biomedical Domain: Short Literature Review
- Author
-
El Hannaoui, Oumayma, Nfaoui, El Habib, El Haoussi, Fatima, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Ezziyyani, Mostafa, editor, and Balas, Valentina Emilia, editor
- Published
- 2023
- Full Text
- View/download PDF
34. An Analysis of Word Sense Disambiguation (WSD)
- Author
-
Nanjundan, Preethi, Mathews, Eappen Zachariah, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Jain, Sarika, editor, Groppe, Sven, editor, and Mihindukulasooriya, Nandana, editor
- Published
- 2023
- Full Text
- View/download PDF
35. Building a Semantically Annotated Corpus of Chinese Directional Complements
- Author
-
Kang, Byeongkwu, Yu, Sukyong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Su, Qi, editor, Xu, Ge, editor, and Yang, Xiaoyan, editor
- Published
- 2023
- Full Text
- View/download PDF
36. An Adaptive Algorithm for Polysemous Words in Natural Language Processing
- Author
-
Kokane, Chandrakant, Babar, Sachin, Mahalle, Parikshit, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Reddy, A. Brahmananda, editor, Nagini, S., editor, Balas, Valentina E., editor, and Raju, K. Srujan, editor
- Published
- 2023
- Full Text
- View/download PDF
37. Amharic Sentence-Level Word Sense Disambiguation Using Transfer Learning
- Author
-
Mossa, Neima, Meshesha, Million, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Woldegiorgis, Bereket H., editor, Mequanint, Kibret, editor, Bitew, Mekuanint A., editor, Beza, Teketay B., editor, and Yibre, Abdulkerim M., editor
- Published
- 2023
- Full Text
- View/download PDF
38. Bi-matching Mechanism to Combat Long-tail Senses of Word Sense Disambiguation
- Author
-
Zhang, Junwei, He, Ruifang, Guo, Fengyu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Amini, Massih-Reza, editor, Canu, Stéphane, editor, Fischer, Asja, editor, Guns, Tias, editor, Kralj Novak, Petra, editor, and Tsoumakas, Grigorios, editor
- Published
- 2023
- Full Text
- View/download PDF
39. Word Sense Disambiguation from English to Indic Language: Approaches and Opportunities
- Author
-
Mishra, Binod Kumar, Jain, Suresh, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Patel, Kanubhai K., editor, Santosh, K. C., editor, and Patel, Atul, editor
- Published
- 2023
- Full Text
- View/download PDF
40. Enrichment of OntoSenseNet: Adding a Sense-annotated Telugu Lexicon
- Author
-
Parupalli, Sreekavitha, Singh, Navjyoti, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, and Gelbukh, Alexander, editor
- Published
- 2023
- Full Text
- View/download PDF
41. Homograph Language Identification Using Machine Learning Techniques
- Author
-
Ansari, Mohd Zeeshan, Ahmad, Tanvir, Khan, Sunubia, Mabood, Faria, Faizan, Mohd, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Saraswat, Mukesh, editor, Chowdhury, Chandreyee, editor, Kumar Mandal, Chintan, editor, and Gandomi, Amir H., editor
- Published
- 2023
- Full Text
- View/download PDF
42. Approach Toward Word Sense Disambiguation for the English-To-Sanskrit Language Using Naïve Bayesian Classification
- Author
-
Maurya, Archana Sachindeo, Bahadur, Promila, Garg, Srishti, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Khanna, Ashish, editor, Gupta, Deepak, editor, Kansal, Vineet, editor, Fortino, Giancarlo, editor, and Hassanien, Aboul Ella, editor
- Published
- 2023
- Full Text
- View/download PDF
43. Resolving Lexical Level Ambiguity: Word Sense Disambiguation for Telugu Language by Exploiting IndicBERT Embeddings
- Author
-
Durgaprasad, Palanati, Sunitha, K. V. N., Padmajarani, B., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Bhateja, Vikrant, editor, Mohanty, Jnyana Ranjan, editor, Flores Fuentes, Wendy, editor, and Maharatna, Koushik, editor
- Published
- 2023
- Full Text
- View/download PDF
44. Word sense disambiguation of acronyms in clinical narratives
- Author
-
Daphné Chopard, Padraig Corcoran, and Irena Spasić
- Subjects
natural language processing ,word sense disambiguation ,acronym disambiguation ,machine learning ,deep learning ,silver standard ,Medicine ,Public aspects of medicine ,RA1-1270 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Clinical narratives commonly use acronyms without explicitly defining their long forms. This makes it difficult to automatically interpret their sense as acronyms tend to be highly ambiguous. Supervised learning approaches to their disambiguation in the clinical domain are hindered by issues associated with patient privacy and manual annotation, which limit the size and diversity of training data. In this study, we demonstrate how scientific abstracts can be utilised to overcome these issues by creating a large automatically annotated dataset of artificially simulated global acronyms. A neural network trained on such a dataset achieved the F1-score of 95% on disambiguation of acronym mentions in scientific abstracts. This network was integrated with multi-word term recognition to extract a sense inventory of acronyms from a corpus of clinical narratives on the fly. Acronym sense extraction achieved the F1-score of 74% on a corpus of radiology reports. In clinical practice, the suggested approach can be used to facilitate development of institution-specific inventories.
- Published
- 2024
- Full Text
- View/download PDF
45. A Study on Lexical Disambiguation in English Translation Based on Twin Neural Networks
- Author
-
Cui Cui
- Subjects
twin neural network ,bilstm+attention ,corpus alignment ,stacked-lstm ,word sense disambiguation ,05c82 ,Mathematics ,QA1-939 - Abstract
To solve the problem of the lack of effective algorithmic models to improve the accuracy of lexical disambiguation in English translation, this paper constructs a twin network lexical disambiguation model based on the characteristics of twin networks, and studies the construction process from the original corpus to the input sample pairs. The Stacked-LSTM algorithm is utilized to align the input Chinese and English corpus and expand the dataset. To achieve disambiguation, the input sample similarity is calculated after training the twin neural network, which extracts corpus features using BiLSTM Attention. After comparing the disambiguation experiments of various algorithms, the model of this algorithm can effectively calculate the similarity of the input samples and achieve the disambiguation accuracy of 68.23% for English vocabulary translation, and 87.0% for vocabulary segmentation of complex English sentences or articles. This shows that the model of this algorithm has good performance for disambiguating English translations.
- Published
- 2024
- Full Text
- View/download PDF
46. EnhancedBERT: A feature-rich ensemble model for Arabic word sense disambiguation with statistical analysis and optimized data collection
- Author
-
Sanaa Kaddoura and Reem Nassar
- Subjects
Arabic natural language processing ,Word sense disambiguation ,Machine learning ,Knowledge-based ,BERT ,Performance evaluation ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Accurate assignment of meaning to a word based on its context, known as Word Sense Disambiguation (WSD), remains challenging across languages. Extensive research aims to develop automated methods for determining word senses in different contexts. However, the literature lacks the presence of datasets generated for the Arabic language WSD. This paper presents a dataset comprising a hundred polysemous Arabic words. Each word in the dataset encompasses 3–8 distinct senses, with ten example sentences per sense. Some statistical operations are conducted to gain insights into the dataset, enlightening its characteristics and properties. Subsequently, a novel WSD approach is proposed to utilize similarity measures and find the overlap between contextual information and dictionary definitions. The proposed method uses the power of BERT, a pre-trained language model, to enable effective Arabic word disambiguation. In training, new features are integrated to improve the model's ability to differentiate between various senses of words. The proposed BERT models are combined to compose an ensemble model architecture to improve the classification performances. The performance of the WSD system outperforms state-of-the-art systems, achieving an approximate F1-score of 96 %. Statistical analyses are performed to evaluate the overall performance of the WSD approach by providing additional information on model predictions. A case study was implemented to test the effectiveness of WSD in sentiment analysis, a downstream task.
- Published
- 2024
- Full Text
- View/download PDF
47. Improved Unsupervised Statistical Machine Translation via Unsupervised Word Sense Disambiguation for a Low-Resource and Indic Languages.
- Author
-
Saxena, Shefali, Chaurasia, Uttkarsh, Bansal, Nitin, and Daniel, Philemon
- Subjects
- *
MACHINE translating , *WORD order (Grammar) , *LANGUAGE ability testing , *LANGUAGE & languages , *VOCABULARY , *SOLAR stills , *TANTALUM - Abstract
Besides word order, word choice is a key stumbling block for machine translation (MT) in morphologically rich languages due to homonyms and polysemous difficulties. On the other hand, un-translated/improperly translated words are a severe issue for Statistical Machine Translation (SMT) models. The quantity of parallel training corpus has limited unsupervised SMT (USMT) systems. Still, current research lines have successfully trained SMT systems in an unsupervised manner using monolingual data alone. However, there is still a need to enhance the translation quality of the MT output due to unaligned and improperly sensed words. This problem is addressed by incorporating unsupervised Word Sense Disambiguation (WSD) into the decoding phase of USMT. The work provided a compendium of SMT systems for five translation tasks, i.e. En→Indic languages for the WMT test dataset and evaluated on BLEU and METEOR evaluation metrics. The studies were performed on En→Hi, En→Kn, En→Ta, En→Te, and En→Be tasks and showed an improvement in BLEU points by 2.3, 2.68, 0.78, 2.32, and 1.79, respectively, and METEOR points by 1.07, 1.34, 0.72, 0.693, and 1.191, respectively, over the baseline model. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
48. Short Text Classification Based on Hybrid Semantic Expansion and Bidirectional GRU (BiGRU) Based Method to Improve Hate Speech Detection.
- Author
-
Muzakir, Ari, Adi, Kusworo, and Kusumaningrum, Retno
- Subjects
HATE speech ,SOCIAL media ,AUTOMATIC speech recognition ,SEMANTICS ,KNOWLEDGE base ,DEEP learning ,DATA distribution - Abstract
The persistent prevalence of hate speech on contemporary social media platforms demands advanced detection methods to address specific categories and levels of offenses. This research focuses on enhancing hate speech detection by refining text representation through a semantic expansion approach, surpassing the limitations of conventional methods. The back-translation technique is employed to enhance sentence structure. Initially, the Lesk Algorithm is utilized for word disambiguation in the semantic expansion process, identifying word meanings within relevant contexts. Subsequently, knowledge bases from WordNet and Kateglo are leveraged to enrich contextual information. The final step involves using Cosine Similarity to select the most appropriate words based on the highest scores. The combined semantic expansion technique significantly improves classification performance compared to conventional methods. Data, with and without semantic expansion, is vectorized into the BERT embedding space and classified using deep learning models such as CNN, BiGRU, and BiLSTM. The proposed approach consistently demonstrates high accuracy across all model types: CNN (88%), BiGRU (88.3%), and BiLSTM (87.3%). In contrast, models without semantic expansion yield relatively lower results-CNN (83.6%), BiGRU (83.3%), and BiLSTM (83.1%). This underscores the substantial breakthrough of the semantic expansion approach in overcoming challenges related to data distribution and semantic feature scarcity, ultimately resulting in improved classification performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
49. Word Sense Disambiguation for Indic Language using Bi-LSTM
- Author
-
Mishra, Binod Kumar and Jain, Suresh
- Published
- 2024
- Full Text
- View/download PDF
50. Reversal of the Word Sense Disambiguation Task Using a Deep Learning Model
- Author
-
Algirdas Laukaitis
- Subjects
word sense disambiguation ,natural language processing ,WordNet ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Word sense disambiguation (WSD) remains a persistent challenge in the natural language processing (NLP) community. While various NLP packages exist, the Lesk algorithm in the NLTK library demonstrates suboptimal accuracy. In this research article, we propose an innovative methodology and an open-source framework that effectively addresses the challenges of WSD by optimizing memory usage without compromising accuracy. Our system seamlessly integrates WSD into NLP tasks, offering functionality similar to that provided by the NLTK library. However, we go beyond the existing approaches by introducing a novel idea related to WSD. Specifically, we leverage deep neural networks and consider the language patterns learned by these models as the new gold standard. This approach suggests modifying existing semantic dictionaries, such as WordNet, to align with these patterns. Empirical validation through a series of experiments confirmed the effectiveness of our proposed method, achieving state-of-the-art performance across multiple WSD datasets. Notably, our system does not require the installation of additional software beyond the well-known Python libraries. The classification model is saved in a readily usable text format, and the entire framework (model and data) is publicly available on GitHub for the NLP research community.
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.