Descriptor: "WSD" / Publication Type: Academic Journals - Searchworks@Jio Institute Digital Library Search Results

1. Homograph recognition algorithm based on Euclidean metric

Author: Elisa S. Izrailova, Arslanbek V. Astemirov, Ayshat S. Badaeva, Zelimhan A. Sultanov, Salaudin M. Umarkhadzhiev, Mokhmad-Salekh L. Khekhaev, and Madina L. Yasaeva
Subjects: graphic homonymy, homographs, wsd, speech synthesis, chechen language, low resource languages, text corpus, Optics. Light, QC350-467, Electronic computers. Computer science, QA75.5-76.95
Abstract: The problem of resolving the uncertainties associated with homonymy for the Chechen language has become especially relevant after the creation of speech synthesis systems. The main disadvantage of speech synthesizers in the Chechen language are errors in reading homograph words that differ in the length / brevity of vowels — the longitude of such sounds is not displayed in any way when writing. The reproduction of diphthongs, which are indicated on the letter in the same way as monophthongs close to them in sound, causes problems. To improve the quality of synthesized speech in the Chechen language, an automatic homograph recognition program is needed. To solve this problem, the article considers the task of eliminating the ambiguity of the meaning of the words WSD (Word Sense Disambiguation). Algorithmic (supervised) methods based on a pre-marked database have been selected for the Chechen language. These methods are the most common solutions for eliminating the ambiguity of the meaning of words. The implementation of such methods is possible in the presence of large marked-up corpora that are inaccessible to most languages of the world including Chechen. The Chechen language belongs to low-resource languages for which the optimal approach from the point of view of saving labor and time resources is a semi-controlled hybrid method of homograph recognition based on the use of algorithmic and statistical methods. The algorithm created by the authors for recognizing homographs by six adjacent words in a sentence is presented. The method is implemented as a program. Preliminary preparation of the initial data for the operation of the algorithm includes marking of proposals by the values of homographs performed “manually”. The results of the program were evaluated using generally recognized accuracy metrics and amounted to F1 — 39 %, Accuracy — 45 %. A comparative analysis of the data obtained with the results of other methods and models showed that the accuracy of the algorithm presented in this article is closest to the results of the accuracy of algorithms based on the Lesk method. Using Lesk method for English, the results of F1 accuracy were obtained — 41.1 % (simple Lesk) and 51.1 % (extended Lesk). Methods using neural network algorithms provide higher WSD accuracy rates for most languages; however, their implementation requires large data bodies, which is not always available for low-resource languages, including Chechen.
Published: 2024
Full Text: View/download PDF

2. Contextual word disambiguates of Ge'ez language with homophonic using machine learning

Author: Mequanent Degu Belete, Ayodeji Olalekan Salau, Girma Kassa Alitasb, and Tigist Bezabh
Subjects: Ge'ez language, WSD, Text vectorization, Machine learning, Philology. Linguistics, P1-1091
Abstract: According to natural language processing experts, there are numerous ambiguous words in languages. Without automated word meaning disambiguation for any language, the development of natural language processing technologies such as information extraction, information retrieval, machine translation, and others are still challenging task. Therfore, this paper presents the development of a word sense disambiguation model for duplicate alphabet words for the Ge'ez language using corpus-based methods. Because there is no wordNet or public dataset for the Ge'ez language, 1010 samples of ambiguous words were gathered. Afterwards, the words were preprocessed and the text was vectorized using bag of words, Term Frequency-Inverse Document Frequency, and word embeddings such as word2vec and fastText. The vectorized texts are then analysed using the supervised machine learning algorithms such Naive Bayes, decision trees, random forests, K-nearest neighbor, linear support vector machine, and logistic regression. Bag of words paired with random forests outperformed all other combinations, with an accuracy of 99.52%. However, when Deep learning algorithms such as Deep neural network and Long Short-Term memory were used for the same dataset, a 100% accuracy was achieved.
Published: 2024
Full Text: View/download PDF

3. DETERMINATION OF LRFD ENVIRONMENTAL LOAD FACTORS OF OFFSHORE PLATFORM IN THE NORTH OF JAVA SEA AND MAKASSAR STRAIT.

Author: Paramashanti, Rildova, Hermanto, Mochammad Fathurridho, and Nandalianadhira, Nafisa
Subjects: OFFSHORE structures, STRAITS, WAVE analysis, BP Deepwater Horizon Explosion & Oil Spill, 2010
Abstract: The environmental load factor in commonly used offshore platforms design code, API RP-2A (American Petroleum Institute - Recommended Practice 2A), is developed based on the environmental conditions of American waters, especially the Gulf of Mexico, which have relatively extreme environmental conditions when compared to Indonesian waters. Case studies were conducted to determine environmental load factors in Indonesian waters, particularly the North Java Sea and Makassar Strait, categorized as shallow seas. This analysis was carried out on the performance criteria of pushover failure. In this study, the base shear was analyzed to describe the strength of the structure in the form of a collapse base shear (CBS) and the load in the form of a wave base shear (WBS). CBS was obtained through pushover analysis with yield strength randomness. WBS was obtained through in-place analysis with wave height randomness. This concept was applied to the structure of the Monopod and Braced Monopod types of offshore platforms located in the North of Java Sea and Makassar Strait waters which had been optimized for the WSD and LRFD design methods. The reliability of the structure was analyzed based on the CBS and WBS values using the First Order Reliability Method (FORM) II. The reliability analysis results were in the form of a reliability index (ß). North Java Sea gives a reliability index in the range from 3.58 to 4.38 for every design criteria. While Makassar Strait gives a reliability index in the range from 3.17 to 3.54 for every design criteria. With a high target safety level for the North Java Sea location, a 1.10 environmental load factor is recommended for further offshore structure design. But, for the Makassar Strait location, more studies need to be done to get better environmental load factor recommendations. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. A new rainfall prediction model based on ICEEMDAN-WSD-BiLSTM and ESN.

Author: Zhang, Xianqi, Chen, Haiyang, Wen, Yihao, Shi, Jinwen, and Xiao, Yimeng
Subjects: PREDICTION models, HILBERT-Huang transform, SIGNAL denoising, REGIONAL development, FLOOD control, FLOODS, DROUGHTS
Abstract: Precipitation, as an important indicator describing the evolution of the regional climate system, plays an important role in understanding the spatial and temporal distribution characteristics of regional precipitation. Scientific and accurate prediction of regional precipitation is helpful to provide theoretical basis for relevant departments to guide flood and drought control. To address the uncertainty and nonlinear characteristics of precipitation series, this paper uses the established improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN)-wavelet signal denoising (WSD)-bi-directional long short-term memory (BiLSTM), and echo state network (ESN) models to predict precipitation of four cities in southern Anhui Province. The BiLSTM is used to predict the high-frequency components and the ESN to predict the low-frequency components, thus avoiding the influence between the two neural network predictions. The results show that the ICEEMDAN-WSD-BiLSTM and ESN models are more accurate. The average relative error reached 2.64% and the NSE (Nash–Sutcliffe efficiency coefficient) was 0.91, which was significantly better than the other four models. The model reveals the temporal change pattern and evolution characteristics of future precipitation, guides flood prevention and mitigation, and has certain theoretical significance and application value for promoting regional sustainable development. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. Assessment of Information Extraction Techniques, Models and Systems.

Author: Rahman, Atta-ur, Musleh, Dhiaa, Nabil, Majed, Alubaidan, Haya, Gollapalli, Mohammed, Krishnasamy, Gomathi, Almoqbil, Dakheel, Khan, Mohammad Aftab Alam, Farooqui, Mehwash, Ahmed, Mohammed Imran Basheer, Ahmed, Mohammed Salih, and Mahmud, Maqsood
Subjects: DATA mining, EXTRACTION techniques, SEARCH engines, ABSTRACTING & indexing services, DIGITAL libraries
Abstract: The present article aims to review and evaluate the practiced and classical techniques, tools, models, and systems concerning automatic information extraction (IE) from published scientific documents like research articles, patents, theses, technical reports, and case studies etc. IE is performed for various reasons such as better indexing, archiving, searching, and retrieving. That is mainly used by the search engines and the indexing services as well the digital libraries and semantic web. In this regard, several studies have been conducted targeting various nature of documents. The study pays special consideration to the successful IE models, algorithms and approaches applied to structural IE from published documents. To grasp this, the paper is classified into several segments and each segment covers a significant aspect of IE. Furthermore, to validate their benefits and drawbacks, a comparative study of all the approaches have been conducted in terms of various performance factors like precision, accuracy, recall and F-score. Potential areas of improvement have been emphasized as research gap for the scholars in the closely related areas. Ultimately, a comprehensive summary of the evaluation is presented in tabular form and review is concluded. It was observed that the hybrid methods outperform the other methods due to their versatile nature to address various document formats. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

6. Word Sense Induction with Attentive Context Clustering

Author: Moshe Stekel, Amos Azaria, and Shai Gordin
Subjects: clustering, nlp, wsi, wsd, [info.info-ai]computer science [cs]/artificial intelligence [cs.ai], History of scholarship and learning. The humanities, AZ20-999, Bibliography. Library science. Information resources
Abstract: This paper presents ACCWSI (Attentive Context Clustering WSI), a method for Word Sense Induction, suitable for languages with limited resources. Pretrained on a small corpus and given an ambiguous word (a query word) and a set of excerpts that contain it, ACCWSI uses an attention mechanism for generating context-aware embeddings, distinguishing between the different senses assigned to the query word. These embeddings are then clustered to provide groups of main common uses of the query word. We show that ACCWSI performs well on the SemEval-2 2010 WSI task. ACCWSI also demonstrates practical applicability for shedding light on the meanings of ambiguous words in ancient languages, such as Classical Hebrew and Akkadian. In the near future, we intend to turn ACCWSI into a practical tool for linguists and historians.
Published: 2022
Full Text: View/download PDF

7. Assamese Word Sense Disambiguation using Cuckoo Search Algorithm.

Author: Gogoi, Arjun, Baruah, Nomi, and Nath, Lakhya Jyoti
Subjects: SEARCH algorithms, MACHINE translating, ALGORITHMS, NATURAL languages, TABU search algorithm, PROBLEM solving, NATURAL language processing
Abstract: Natural language processing is associated with human-computer interaction, where several challenges require natural language understanding. The Word sense disambiguation problem comprises the computational assignment of meaning to a word according to a specific context in which it occurs. There are numerous natural language processing applications, such as machine translation, information retrieval, and information extraction, which require this task which takes place at the semantic level. To solve this problem unsupervised computation proposals can be effective since they have been successfully used for many real-world optimization problems. In this paper, we propose to solve the word sense disambiguation problem using the cuckoo search algorithm in the Assamese language. We illustrate the performance of our algorithm by carrying out experiments on an Assamese corpus. And comparing them against an unsupervised genetic algorithm that is implemented in the Assamese language. Results of the experiment show that the cuckoo algorithm can achieve more precision, recall and F-measure, attaining 87.5, 84, and 85.71 percentages respectively. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

8. Arabic Gloss WSD Using BERT.

Author: El-Razzaz, Mohammed, Fakhr, Mohamed Waleed, Maghraby, Fahima A., and Prati, Andrea
Subjects: VOCABULARY, CORPORA
Abstract: Word Sense Disambiguation (WSD) aims to predict the correct sense of a word given its context. This problem is of extreme importance in Arabic, as written words can be highly ambiguous; 43% of diacritized words have multiple interpretations and the percentage increases to 72% for non-diacritized words. Nevertheless, most Arabic written text does not have diacritical marks. Gloss-based WSD methods measure the semantic similarity or the overlap between the context of a target word that needs to be disambiguated and the dictionary definition of that word (gloss of the word). Arabic gloss WSD suffers from a lack of context-gloss datasets. In this paper, we present an Arabic gloss-based WSD technique. We utilize the celebrated Bidirectional Encoder Representation from Transformers (BERT) to build two models that can efficiently perform Arabic WSD. These models can be trained with few training samples since they utilize BERT models that were pretrained on a large Arabic corpus. Our experimental results show that our models outperform two of the most recent gloss-based WSDs when we test them against the same test data used to evaluate our model. Additionally, our model achieves an F1-score of 89% compared to the best-reported F1-score of 85% for knowledge-based Arabic WSD. Another contribution of this paper is introducing a context-gloss benchmark that may help to overcome the lack of a standardized benchmark for Arabic gloss-based WSD. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

9. Word sense disambiguation based on stretchable matching of the semantic template.

Author: Wang, Wei, Huang, Degen, and Yu, Haitao
Subjects: *NATURAL language processing, *VARIATION in language, *NATURAL languages, *VOCABULARY
Abstract: It is evident that the traditional hard matching of a fixed-length template cannot satisfy the nearly indefinite variations in natural language. This issue mainly results from three major problems of the traditional matching mode: 1) in matching with a short template, the context of natural language cannot be effectively captured; 2) in matching with a long template, serious data sparsity will lead to a low success rate of template matching (i.e., low recall); and 3) due to a lack of flexible matching ability, traditional hard matching is more prone to failure. Therefore, this paper proposed a novel method of stretchable matching of the semantic template (SMOST) to deal with the above problems. We have applied this method to word sense disambiguation in the natural language processing field. In the same case of using only the SemCor corpus, the result of our system is very close to the best result of existing systems, which shows the effectiveness of new proposed method. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

10. Dataset of white spot disease affected shrimp farmers disaggregated by the variables of farm site, environment, disease history, operational practices, and saline zones

Author: Neaz A. Hasan and Mohammad Mahfujul Haque
Subjects: Disaggregated data, Shrimp farming, Risk factors, WSD, Bangladesh, Computer applications to medicine. Medical informatics, R858-859.7, Science (General), Q1-390
Abstract: The article presents the summary of a dataset related to the risks factors of white spot disease (WSD) of farmed shrimp (Penaeus monodon) in Khulna, Bagerhat and Satkhira districts of Bangladesh. This dataset was developed following two consecutive steps. In the first step, participatory rural appraisal tools were applied to get the conceptual framework for data collection regarding lists of farmers and the variables of the risk factors of WSD. In the second step, sampling of farmers, google featured questionnaire development, and mobile phone-assisted survey were carried out. The total surveyed farms were 233 consisting of 21 and 212 semi-intensive and extensive farms, respectively. The data were collected in the form of continuous, nominal and binary variables disaggregated by saline zones. The dataset contains some basic socio-economic data of shrimp farmers, farm characteristics, environmental attributes and disease history of shrimp farms. The dataset also has GPS coordinates of all the surveyed farms individually which are very useful for spatial analysis. In total, the dataset in MS Excel has 46 variables and attached as the supplementary material with this article.
Published: 2020
Full Text: View/download PDF

11. Improving stemming for Assamese information retrieval

Author: Gogoi, Arjun, Baruah, Nomi, Sarma, Sikhar Kr., and Phukan, Rakhee D.
Published: 2021
Full Text: View/download PDF

12. Disambiguation of Biomedical Acronyms Based on a Bidirectional Recurrent Neural Network of Character-level Features.

Author: Ren Kai, Li Na, Xiong Wei, and Wang Shi-Wen
Subjects: *RECURRENT neural networks, *ACRONYMS, *MODEL railroads
Abstract: Polysemic acronyms are very common in the field of biomedicine. These acronyms have different senses in different contexts. The ambiguity of acronyms may cause significant negative impact on the understanding of the full text by machine learning. To address the disambiguation of acronyms in the biomedical domain, most associated studies are based on methods using word-level contextual features. These methods require abundant relevant external resources for model training, and the accuracy of their disambiguation of acronyms may decrease greatly upon the lack of external resources. In this study, disambiguation of biomedical acronyms was investigated on the basis of the character-level feature model to realize the disambiguation of biomedical acronyms with largely limited external corpora. First, sentences containing ambiguous acronyms were extracted through retrieval and the feature vector of the context were initialized by using the character-level features. Second, these initial vectors were input into the bidirectional long shortterm memory neutral network model for training. Finally, the disambiguation of acronyms was realized by the outputs of the neutral network model through the Softmax classification approach. The results of acronym disambiguation based on character-level feature model were also compared with those based on word-level feature models. Results demonstrate that the average accuracy of the character-level feature neutral network algorithm reaches 85.82% on the dataset of 106 common biomedical acronyms. Thus, the character-level feature neutral network algorithm is superior to the traditional methods, which use a large number of external resources. This study confirms that the disambiguation method based on character-level features is applicable to the disambiguation of biomedical acronyms under limited relevant data. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

13. Arabic Gloss WSD Using BERT

Author: Mohammed El-Razzaz, Mohamed Waleed Fakhr, and Fahima A. Maghraby
Subjects: WSD, BERT, Arabic, context gloss, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Word Sense Disambiguation (WSD) aims to predict the correct sense of a word given its context. This problem is of extreme importance in Arabic, as written words can be highly ambiguous; 43% of diacritized words have multiple interpretations and the percentage increases to 72% for non-diacritized words. Nevertheless, most Arabic written text does not have diacritical marks. Gloss-based WSD methods measure the semantic similarity or the overlap between the context of a target word that needs to be disambiguated and the dictionary definition of that word (gloss of the word). Arabic gloss WSD suffers from a lack of context-gloss datasets. In this paper, we present an Arabic gloss-based WSD technique. We utilize the celebrated Bidirectional Encoder Representation from Transformers (BERT) to build two models that can efficiently perform Arabic WSD. These models can be trained with few training samples since they utilize BERT models that were pretrained on a large Arabic corpus. Our experimental results show that our models outperform two of the most recent gloss-based WSDs when we test them against the same test data used to evaluate our model. Additionally, our model achieves an F1-score of 89% compared to the best-reported F1-score of 85% for knowledge-based Arabic WSD. Another contribution of this paper is introducing a context-gloss benchmark that may help to overcome the lack of a standardized benchmark for Arabic gloss-based WSD.
Published: 2021
Full Text: View/download PDF

14. WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration

Author: Andrew Krizhanovsky, Alexander Kirillov, and Natalia Krizhanovskaya
Subjects: synonym, synset, corpus linguistics, word2vec, wikisource, wsd, rusvectores, wiktionary, Science
Abstract: The problem of word sense disambiguation (WSD) is considered in the article. Set of synonyms (synsets) and sentences with these synonyms are taken. It is necessary to automatically select the meaning of the word in the sentence. 1285 sentences were tagged by experts, namely, one of the dictionary meanings was selected by experts for target words. To solve the WSD problem, an algorithm based on a new method of vector-word contexts proximity calculation is proposed. A preliminary epsilon-filtering of words is performed, both in the sentence and in the set of synonyms, in order to achieve higher accuracy. An extensive program of experiments was carried out. Four algorithms are implemented, including the new algorithm. Experiments have shown that in some cases the new algorithm produces better results. The developed software and the tagged corpus have an open license and are available online. Wiktionary and Wikisource are used. A brief description of this work can be viewed as slides (https://goo.gl/9ak6Gt). A video lecture in Russian about this research is available online (https://youtu.be/-DLmRkepf58).
Published: 2018
Full Text: View/download PDF

15. Selecting an Appropriate Web-Scale Discovery Service: A Study of the Big 4's.

Author: Kumar, Vinit
Subjects: *FEDERATED searching, *MARKETPLACES, *USER interfaces, *ALGORITHMS
Abstract: Seeing the dynamic user demands the libraries are changing their approach to the user services. The web-scale discovery (WSD) service is the latest attempt in this direction. There are several players in the marketplace providing solutions for WSD with products having basic features and subtle features as well. As more players are entering the marketplace, it becomes challenging to select appropriate WSD system. It is also paramount for the library managers to be aware of the myriad range of features and underlying technology of WSD. This understanding will be for making informed purchase decisions. This paper attempts to explain in detail the components of a typical WSD system. Further, the paper evaluates the features of the Big 4's in WSD. The paper concludes by discussing some of the parameters to consider while evaluating the WSD system. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

16. Micro and nanometric wear evaluation of metal discs used on determination of biodiesel fuel lubricity

Author: Aline Cristina Mendes de Farias, João Telésforo Nóbrega de Medeiros, and Salete Martins Alves
Subjects: AFM, WSD, roughness, biodiesel, lubricity, HFRR, Materials of engineering and construction. Mechanics of materials, TA401-492
Abstract: The contact of diesel fuel with engine subsystems demands a good wear resistance. Lubricity is an important feature for integrity of injection system and the sulphur composites are primarily responsible for lubrication of the injector nozzle. Biodiesel is responsible for partially restoring the lubricity of diesel fuel that presents low levels of sulphur composites and, furthermore, it causes less pollution than diesel fuel. The lubricity is measured through the wear scar diameter following the ASTM D 975 standards. However, the friction and wear with light loads of micro/nanocomponents are highly dependent on surface interactions that can be evaluated by microscopy techniques. This study aimed to measure and to analyze the biodiesel lubricity and their blends (B5, B20) with diesel by observing the wear scars of discs using the scanning electronic microscopy (SEM), atomic force microscopy (AFM) and micro roughness techniques. The fuels performance was evaluated using HFRR tribometer. The tests conditions were based on standard ADTM D-6079-04. The coefficient of friction was measure during the test. After the test, the worn ball and disc were analyzed by SEM, AFM and profilometer. The results showed that the addition of biodiesel in diesel improve the tribological performance of fuel. Also, the just WSD value is not sufficient to evaluate the lubrication ability of a fuel. Analysis of the worn disc surfaces proved to be compatible with WSD number and also more sensitive to these kinds of fuels, showing mainly the form and intensity of the wear.
Published: 2014

17. MODELING SEMANTIC DISTANCE IN THE PATTERN DICTIONARY OF ENGLISH VERBS.

Author: CINKOVÁ, SILVIE and HLÁVKA, ZDENĚK
Subjects: *SEMANTICS, *VERBS, *LAMMA language, *KWIC (Indexing system), *ENCYCLOPEDIAS & dictionaries
Abstract: We explore human judgments on how well individual patterns of 29 target verbs from the Pattern Dictionary of English Verbs describe their random KWICs. We focus on cases where more than one pattern is judged as highly appropriate for a given KWIC and seek to estimate the effect of event participants (arguments) being denotatively similar in two patterns, considering all pair combinations in a given lemma. We compare this effect to the effect of several contextual features of the KWICs, the effect of paired PDEV implicatures implying each other, and the effect of belonging to a given lemma. We show that the lemma effect is still stronger than any feature going across lemmas we have examined so far, so that each verb appears to be a little universe in its own right. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

18. Monitoring of low voltage grids with the use of SAIDI indexes.

Author: ŁUKASIK, Zbigniew, KOZYRA, Jacek, and KUŚMIŃSKA-FIJAŁKOWSKA, Aldona
Subjects: LOW voltage systems, ELECTRIC power distribution grids, ELECTRIC utility costs, ELECTRIC power production, ELECTRIC power consumption
Abstract: Copyright of Przegląd Elektrotechniczny is the property of Przeglad Elektrotechniczny and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2017
Full Text: View/download PDF

19. Generating the missing links for semantic relations within Wiktionary.

Author: Bawakid, Abdullah
Subjects: *SEMANTICS, *COMPARATIVE linguistics
Abstract: In many cases, a single presentation of a term may carry multiple meanings. Wiktionary provides a way for viewing the meanings of the different terms it stores in the form of senses. It also provides semantic relations. However, Wiktionary, in its current form, contains semantic relations linking Wiktionary entries at the term level. Links for semantic relations connecting entries at the word sense level do not currently exist in Wiktionary. In this paper, we propose a novel method for generating a new type of links for semantic relations within Wiktionary. This is effectively applied to aligning the source words senses for semantic relations in Wiktionary with their corresponding target word senses. We use surface-level features that rely only on the structure and content of Wiktionary for completing this task without the aid of any external lexical or knowledge bases. We present the details of the method and how it was implemented. Additionally, we describe the evaluations that we performed and illustrate the competitive results we obtained, especially when compared to other systems. Our findings indicate that our system outperforms the baselines and performs similar to state-of-art systems without requiring access to external online resources or training data to run. [ABSTRACT FROM AUTHOR]
Published: 2017

20. Word Sense Disambiguation using Aggregated Similarity based on WordNet Graph Representation

Author: Mădălina ZURINI
Subjects: WSD, Similarity Measure, WordNet, Ontology, Synset, Computer engineering. Computer hardware, TK7885-7895, Bibliography. Library science. Information resources
Abstract: The term of word sense disambiguation, WSD, is introduced in the context of text document processing. A knowledge based approach is conducted using WordNet lexical ontology, describing its structure and components used for the process of identification of context related senses of each polysemy words. The principal distance measures using the graph associated to WordNet are presented, analyzing their advantages and disadvantages. A general model for aggregation of distances and probabilities is proposed and implemented in an application in order to detect the context senses of each word. For the non-existing words from WordNet, a similarity measure is used based on probabilities of co-occurrences. The module of WSD is proposed for integration in the step of processing documents such as supervised and unsupervised classification in order to maximize the correctness of the classification. Future work is related to the implementation of different domain oriented ontologies.
Published: 2013
Full Text: View/download PDF

21. Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources

Author: Paweł Kędzia, Maciej Piasecki, and Marlena Orlińska
Subjects: word sense disambiguation, WSD, page rank, plWordNet, graphs, lexical resources, Computational linguistics. Natural language processing, P98-98.5, Semantics, P325-325.5, Lexicography, P327-327.5
Abstract: Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed.
Published: 2015
Full Text: View/download PDF

22. An Integration Model of Semantic Annotation Based on Synergetic Neural Network.

Author: Huang, Zhehuang and Chen, Yidong
Subjects: SEMANTIC computing, ANNOTATIONS, SYNERGETICS, NEURAL circuitry, NATURAL language processing
Abstract: Correct and automatical semantic analysis has always been one of major goals in natural language understanding. However, due to the difficulties in deep semantic analysis, at present, the mainstream studies of semantic analysis are focused on semantic role labeling (SRL) and word sense disambiguation (WSD). Nowadays, these two issues are mostly considered as separate tasks. However, this approach ignores possible dependencies between them. In order to address the issue, an integrative semantic analysis model based on synergetic neural network (SNN) is proposed in this paper, which can easily express useful logic constraints between SRL and WSD. The semantic analysis process can be viewed as the competition process of semantic order parameters. The strongest order parameter will win by competition and desired semantic patterns will be recognized. There are three main innovations in this paper. First, an integrative semantic analysis model is proposed that jointly models word sense disambiguationand semantic role labeling. Second, integrative order parameter is reconstructed to reflect the relation among semantic patterns. Finally, integrative network parameters and integrative evolution equation are reconstructed, which can reflect the relationship of guiding and driving each other between word sense and semantic roles. The experiment results on OntoNotes 2.0 corpus shows the integrative method in this paper has a higher performance for semantic role labeling and word sense disambiguation, and provides a good practicability and a promising future for other natural language processing tasks. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

23. Word Sense Disambiguation for Arabic Text Categorization.

Author: Hadni, Meryeme, El Alaoui, Said, and Lachkar, Abdelmonaime
Subjects: CATEGORIZATION (Psychology), ARABIC alphabet, SUPPORT vector machines, MACHINE learning, INFORMATION technology, BAYESIAN analysis
Abstract: In this paper, we present two contributions for Arabic Word Sense Disambiguation. In the first one, we propose to use both two external resources Arabic WordNet (A WN) and WN based on term to term Machine Translation System (MTS). The second contribution consists of choosing the nearest concept for the ambiguous terms, based on more relationships with different concepts in the same local context. To evaluate the accuracy of our proposed method, several experiments have been conducted using Feature Selection methods; Chi-Square and CHIR, two machine learning techniques; the Naive Bayesian (NB) and Support Vector Machine (SVM).The obtained results illustrate that using the proposed method increases greatly the performance of our Arabic Text Categorization System. [ABSTRACT FROM AUTHOR]
Published: 2016

24. Web-Scale Discovery Service: Is It Right for Your Library? Mayo Clinic Libraries Experience.

Author: Brigham, Tara J., Farrell, Ann M., Osterhaus Trzasko, Leah C., Attwood, Carol Ann, Wentz, Mark W., and Arp, Kelly A.
Subjects: *ABSTRACTING & indexing services, *CATALOGS, *COMMERCIAL product evaluation, *INFORMATION retrieval, *MEDICAL libraries, *MULTIHOSPITAL systems, *QUESTIONNAIRES, *SURVEYS
Abstract: Web-scale discovery (WSD) tools or services promise to deliver a quick, efficient, and comprehensive search experience through a single-entry point. This article will provide an overview of the analysis conducted by Mayo Clinic Libraries to identify, investigate, and test multiple Web-scale discovery services to determine if one would enhance their library users’ searching experience. As an increasing number of health sciences and medical libraries become interested in implementing discovery tools, the resources available through these tools should be carefully evaluated in the context of user needs. [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

25. An Investigation on the Lubricity Characteristics of Polyethylene Glycol Blends with Cellulose Palmitates.

Author: Singh, Raj, Kukrety, Aruna, Singh, Raghuvir, Saran, Sandeep, and Sharma, Om
Abstract: Three cellulose palmitate samples, viz Cell- Plam- A, Cell- Palm- B and Cell- Palm- C, were synthesized by known esterification reaction between microcrystalline cellulose and varying molar ratios of palmitoyl chloride (CH(CH)COOCl). A combination of N, N-dimethylacetamide (DMAc) and lithium chloride (LiCl) is used as a solvent, while DMAP (4-dimethylaminopyridine) was used as catalyst. The three samples with different degree of substitution (DS) were then characterized using infrared, nuclear magnetic resonance spectroscopy, elemental analysis (CHN), thermogravimetry, scanning electron microscopy and X-ray diffraction to confirm the conversion of the cellulose into the cellulose fatty ester. Cell- Plam- A, Cell- Palm- B and Cell- Palm- C were found to have DS values 1.85, 2.37 and 2.63, respectively. The synthesized cellulose palmitate samples were then tested for lubricity properties in terms of wear scar diameter (WSD) and friction coefficient using a high-frequency reciprocating rig following modified ASTM D6079. It was found that the 3 % Cell- Plam- C in PEG-200 reduces the WSD from 560 to 503 µm, while the average friction coefficient decreases from 0.079 to 0.048. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

26. Topic Modeling and Word Sense Disambiguation on the Ancora corpus.

Author: Izquierdo, Rubén, Postma, Marten, and Vossen, Piek
Subjects: COMPUTATIONAL linguistics, CORPORA
Abstract: Copyright of Procesamiento del Lenguaje Natural is the property of Sociedad Espanola para el Procesamiento del Lenguaje Natural and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2015

27. Watershed development as community coping strategy for climate change impacts in north central highlands of Ethiopia.

Author: Legesse, Solomon Addisu and Rao, P.V.V. Prasada
Subjects: WATERSHED management, WATERSHEDS, CLIMATE change, WATER conservation, BIOPHYSICAL economics
Abstract: The impacts of watershed development on various indicators pertaining to biophysical environment are very high as a coping strategy to climate change related hazards. Among others, the watershed development projects conducted in Amhara National Regional State aimed at improving the biophysical and socioeconomic conditions of the area. In addition to the biophysical parameters, a total sample of 240 households from six model watersheds was selected randomly. Accordingly, about 58% of the total sample watershed areas were covered with soil and water conservation works, whereas the project impacts on the socioeconomic aspects were low to moderate in the majority of the cases. The average livestock ownership in number of oxen and total livestock unit by 2006 was found to be 1.76 and 4.70, respectively. This was found to be 1.59 and 4.73 for 2011, respectively. Similarly, the statistical t-test for both of them is statistically insignificant at 95% level of confidence. This indicates that biophysical impacts are more prominent when compared to economic impacts. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

28. Investigation on the Potential of Dextrose, Sucrose and Cellulose Dodecenylsuccinate Esters as Lubricity Additive.

Author: Singh, Raj, Singh, Arun, Bahuguna, Gajendra, and Saran, Sandeep
Abstract: As a consequence of the environmental issues and the increasing global demand of the lubricants, we need to have the sustainable raw material sources not only for the lube base oil but also for additives. In view of this, we have synthesized the dodecenylsuccinate (DDSA) ester of dextrose (sample 1), sucrose (sample 2) and cellulose (sample 6) with almost same degree of substitution (DS ~2.0). The lubricity was observed for these synthesized samples in n-decane which was taken as lubricant reference base fluid. The lubricity tests were carried out as per the ASTM D6079 using high frequency reciprocating test rig. It was observed that the dextrose-DDSA ester shows good lubricity properties in comparison to sucrose and cellulose. 1 % sucrose-DDSA in n-decane has decreased the value of wear scar diameter from 800 to 515 µm while the values for sucrose-DDSA and cellulose-DDSA were found to be 565 and 590 µm respectively. Further, to see the effect of varying DS over the lubricity in the case of cellulose-DDSA esters, the four cellulose ester samples (sample 3- 6) with varying DS (0.68, 0.95, 1.43 and 2.03) were prepared and evaluated. It was also observed that the lubricity increases with the increasing DS value in cellulose-DDSA esters. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

29. WORD SENSE DISAMBIGUATION BASED ON LARGE SCALE POLISH CLARIN HETEROGENEOUS LEXICAL RESOURCES.

Author: KĘDZIA, PAWEŁ, PIASECKI, MACIEJ, and ORLIŃSKA, MARLENA J.
Subjects: *LEXICOLOGY, *COMPUTATIONAL linguistics, *SEMANTICS
Abstract: Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet--a very large wordnet for Polish--as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise pl WordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk's algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

30. Geo-location White Space Spectrum Databases: Models and Design of South Africa's First Dynamic Spectrum Access Coexistence Manager.

Author: Mfupe, Luzango, Mekuria, Fisseha, and Mzyece, Mjumo
Subjects: WHITE spaces (Telecommunication), DATABASES, DYNAMIC spectrum access, INFORMATION sharing, COMPUTER users
Abstract: Geo-location white space spectrum databases (GL-WSDBs) are currently the preferred technique for enabling spectrum sharing between primary users and secondary users or white space devices (WSDs) in the very-high frequency (VHF) and ultra-high frequency (UHF) bands. This is true because technologies for making low-cost WSDs capable of autonomous sensing and detection of available white space (WS) spectrum are not yet feasible. This paper reviews the necessary enabling technical conditions to allow coexistence of primary and secondary systems in the VHF and UHF spectrum through a GL-WSDB approach. The practical implementation of South Africa's first GL-WSDB was performed. Results of WS channels available from five cities in South Africa calculated from the implemented GL-WSDB was compared with a commercially available GL-WSDB and was found to be 68% similar. Additionally, results from the implemented GL-WSDB were compared with measurements obtained from field spectrum scanning campaigns at two different locations in Cape Town, South Africa, and was found to be 64% similar. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

31. Telecommunications regulation: Creating order & opportunity in UK digital terrestrial television Whitespace.

Author: Lawson, Philip
Subjects: *TELECOMMUNICATIONS laws & regulations, *STAKEHOLDERS, *COMMUNICATION laws, *DIGITAL television, *MACHINE-to-machine communications, *BROADBAND communication systems
Abstract: This paper considers how the UK Regulator, in collaboration with stakeholders, is attempting to create a new market opportunity for advanced networked services using interleaved Digital Terrestrial Television (DTTV) Whitespace, that is, valuable spectrum released and reorganised post the digital switchover. It discusses inter alia a key issue, which is how the Regulator will balance the technical requirements needed to create a commercially viable network, against his competing obligation under the Communications Act 2003 to maintain order and avoid interference to DTTV and Programme Making & Special Event (PMSE) users. Particular consideration is given to the outcome of field trials and areas of forward risk for the Regulator. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

32. Determining the difficulty of Word Sense Disambiguation.

Author: McInnes, Bridget T. and Stevenson, Mark
Abstract: Highlights: [•] We explore estimating WSD performance on a range of ambiguous biomedical terms. [•] We evaluate the difficulty predictions against the output of two WSD systems. [•] Supervised methods are the best predictors but limited by labeled training data. [•] Unsupervised methods all perform well and can be applied more widely. [•] Best performance was obtained using the relatedness measure proposed by Lesk. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

33. Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text.

Author: McInnes, Bridget T. and Pedersen, Ted
Abstract: Introduction: In this article, we evaluate a knowledge-based word sense disambiguation method that determines the intended concept associated with an ambiguous word in biomedical text using semantic similarity and relatedness measures. These measures quantify the degree of similarity or relatedness between concepts in the Unified Medical Language System (UMLS). The objective of this work is to develop a method that can disambiguate terms in biomedical text by exploiting similarity and relatedness information extracted from biomedical resources and to evaluate the efficacy of these measure on WSD. Method: We evaluate our method on a biomedical dataset (MSH-WSD) that contains 203 ambiguous terms and acronyms. Results: We show that information content-based measures derived from either a corpus or taxonomy obtain a higher disambiguation accuracy than path-based measures or relatedness measures on the MSH-WSD dataset. Availability: The WSD system is open source and freely available from http://search.cpan.org/dist/UMLS-SenseRelate/. The MSH-WSD dataset is available from the National Library of Medicine http://wsd.nlm.nih.gov. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

34. Word Sense Disambiguation using Aggregated Similarity based on WordNet Graph Representation.

Author: ZURINI, Mădălina
Subjects: POLYSEMY, SEMANTICS, PROBABILITY theory, ONTOLOGY, DOCUMENT markup languages, TEXT processing (Computer science)
Abstract: The term of word sense disambiguation, WSD, is introduced in the context of text document processing. A knowledge based approach is conducted using WordNet lexical ontology, describing its structure and components used for the process of identification of context related senses of each polysemy words. The principal distance measures using the graph associated to WordNet are presented, analyzing their advantages and disadvantages. A general model for aggregation of distances and probabilities is proposed and implemented in an application in order to detect the context senses of each word. For the non-existing words from WordNet, a similarity measure is used based on probabilities of co-occurrences. The module of WSD is proposed for integration in the step of processing documents such as supervised and unsupervised classification in order to maximize the correctness of the classification. Future work is related to the implementation of different domain oriented ontologies. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

35. An Affix Based Word Classification Method of Assamese Text.

Author: Sarma, Bhairab and Purkayastha, Bipul Shyam
Subjects: NATURAL language processing, COMPUTATIONAL linguistics, SEMANTIC computing, AFFIXES (Grammar), ASSAMESE language
Abstract: Classification of word is an important activity in Natural Language Processing (NLP) analysis. Word classification as we mean in linguistic is not same as in natural language processing. In NLP, the main objective is Part-of-Speech tagging (POST) which if essential for machine translation and language interpretation. However, in linguistic, words are classified as their applications and representation of meaning in the context of real world. Retrieving contextual meaning in language processing is a very challenging job. Because of sense disambiguation, representation ambiguity and words with multiple meaning, the task POST become very difficult. Assamese is a highly inflected and morphologically rich Indian language. In this study, we attempt to classify words based on its morphological structure. We present a method of classification of Assamese word based on its inflectional features. The classes we have used here may not be complement with POS classification. However it could be method of word clustering during POS with application of other smoothing algorithm like HMM, EM etc. We believe that this method can further be implementing into any other inflectional Indian language processing. [ABSTRACT FROM AUTHOR]
Published: 2013

36. Statistics and Results of Ontology Based Document Processing Application.

Author: ZURINI, Mădălina
Subjects: ONTOLOGY, APPLICATION software, STATISTICS, AUTOMATIC classification, TIME-domain analysis, DOCUMENT classification (Electronic documents)
Abstract: The application OBDP (Ontology Based Document Processing) is presented integrating the major steps of document preprocessing, representation and automatic classification and clustering. A comparative analysis is done using the results of classification using the external knowledge base WordNet lexical ontology and the classification using Naïve Bayes and kNN classifiers. Conclusions are drawn and future work is concentrated upon WordNet extending using domain analysis. [ABSTRACT FROM AUTHOR]
Published: 2013

37. The effect of water-soluble fraction of diesel oil on some hematological indices in the great sturgeon Huso huso.

Author: Hedayati, Aliakbar and Jahanbakhshi, Abdolreza
Subjects: DIESEL fuels, WHITE whale, AQUATIC animals, HEMATOLOGY, FISH immunology, LEUCOCYTES, ERYTHROCYTES, EOSINOPHILS, NEUTROPHILS
Abstract: Hematological and Immunological parameters of aquatic animals may be changes due to acute, subacute and chronic exposure to marine pollutants. The purpose of this study was to determine the experimental effects of water-soluble fraction (WSD) doses (0, 10, 100, 500 and 1,000 ppm) for 0, 48 h and 7 days on hematological and immunological features of juvenile great sturgeon Huso huso. Fish exposed after 48 h and 7 days showed a significant change in the white blood cell ( P < 0.01) in contrast to red blood cell ( P > 0.05). Only MCV, neutrophil and lymphocyte show significant change within 48-h exposure to WSD ( P < 0.05), whereas among significant indices, MCV and lymphocyte had been decreased and neutrophil had increased. Seven-day exposures showed a significant change in MCV, neutrophil, eosinophil and lymphocyte concentration in relation to the respective control ( P < 0.05), whereas among significant indices, neutrophils were significantly greater and MCV, eosinophil and lymphocyte were significantly lower than those in control groups ( P < 0.05). [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

38. Spawning stress triggers WSSV replication in brooders via the activation of shrimp STAT

Author: Lin, Shin-Jen, Hsia, Hui-Lan, Liu, Wang-Jing, Huang, Jiun-Yan, Liu, Kuan-Fu, Chen, Wei-Yu, Yeh, Ying-Chun, Huang, Yun-Tzu, Lo, Chu-Fang, Kou, Guang-Hsiung, and Wang, Han-Ching
Subjects: *WHITE spot syndrome virus, *VIRAL replication, *INCUBATORS, *FISH spawning, *SHRIMP culture, *EPIDEMICS, *FISH diseases
Abstract: Abstract: In the early days of shrimp aquaculture, wild-captured brooders usually spawned repeatedly once every 2–4days. However, since the first outbreaks of white spot disease (WSD) nearly 20years ago, captured female brooders often died soon after a single spawning. Although these deaths were clearly attributable to WSD, it has always been unclear how spawning stress could lead to an outbreak of the disease. Using real-time qPCR, we show here that while replication of the white spot syndrome virus (WSSV; the causative agent of WSD) is triggered by spawning, there was no such increase in the levels of another shrimp DNA virus, IHHNV (infectious hypodermal and hematopoietic necrosis virus). We also show that levels of activated STAT are increased in brooders during and after spawning, which is important because shrimp STAT is known to transactivate the expression of the WSSV immediate early gene ie1. Lastly, we used dsRNA silencing experiment to show that both WSSV ie1 gene expression and WSSV genome copy number were reduced significantly after shrimp STAT was knocked-down. This is the first report to demonstrate in vivo that shrimp STAT is important for WSSV replication and that spawning stress increases activated STAT, which in turn triggers WSSV replication in WSSV-infected brooders. [Copyright &y& Elsevier]
Published: 2012
Full Text: View/download PDF

39. Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus.

Author: Stevenson, Mark and Guo, Yikun
Abstract: Abstract: Researchers have access to a vast amount of information stored in textual documents and there is a pressing need for the development of automated methods to enable and improve access to this resource. Lexical ambiguity, the phenomena in which a word or phrase has more than one possible meaning, presents a significant obstacle to automated text processing. Word Sense Disambiguation (WSD) is a technology that resolves these ambiguities automatically and is an important stage in text understanding. The most accurate approaches to WSD rely on manually labeled examples but this is usually not available and is prohibitively expensive to create. This paper offers a solution to that problem by using information in the UMLS Metathesaurus to automatically generate labeled examples. Two approaches are presented. The first is an extension of existing work (Liu et al., 2002 ) and the second a novel approach that exploits information in the UMLS that has not been used for this purpose. The automatically generated examples are evaluated by comparing them against the manually labeled ones in the NLM-WSD data set and are found to outperform the baseline. The examples generated using the novel approach produce an improvement in WSD performance when combined with manually labeled examples. [Copyright &y& Elsevier]
Published: 2010
Full Text: View/download PDF

40. The Treatment of Word Sense Inventories in the 'LACELL WSD Project'.

Author: Almela, Moisés
Subjects: *LINGUISTICS, *DISTINCTIVE features (Linguistics), *POLYSEMY, *INVENTORIES, DICTIONARIES, TERMINOLOGY
Abstract: The WSD community has long debated whether the criteria for representing polysemy in general purpose dictionaries meet the specific demands of sense disambiguation tasks. Concern is growing that pre-defined sense inventories might not adjust well to the needs of WSD, because word occurrences can rarely be paired with rigid sense classes in a one-toone fashion. A second cause for concern is the level of sense granularity adopted in conventional dictionary entries. Fine-grained distinctions can be useful for a dictionary user but complicate the design and evaluation of WSD systems in a way that is often unnecessary. As a result of these objections, many experts have voiced the opinion that dictionaries are not adequate sources of sense inventories for WSD. However, the problem of word sense overlaps can also be resolved by modifying the way in which dictionary entries are processed by WSD programs. This is the solution applied in the LACELL WSD system. The algorithm selects simultaneously two or more dictionary senses if the context does not allow sufficient discrimination between/among them. This article explains the underpinnings of such proposal, as well as discussing some advantages and disadvantages. [ABSTRACT FROM AUTHOR]
Published: 2009

41. Multilingual versus monolingual word sense disambiguation.

Author: Ion, Radu and Tufiş, Dan
Subjects: MULTILINGUALISM, MONOLINGUALISM, LANGUAGE & languages, VOCABULARY, LEARNING
Abstract: This article describes two different word sense disambiguation (WSD) systems, one applicable to parallel corpora and requiring aligned wordnets and the other one, knowledge poorer, albeit more relevant for real applications, relying on unsupervised learning methods and only monolingual data (text and wordnet). Comparing performances of word sense disambiguation systems is a very difficult evaluation task when different sense inventories are used and even more difficult when the sense distinctions are not of the same granularity. However, as we used the same sense inventory, the performance of the two WSD systems can be objectively compared and we bring evidence that multilingual WSD is more precise than monolingual WSD. [ABSTRACT FROM AUTHOR]
Published: 2009
Full Text: View/download PDF

42. AN ATTEMPT TO FORMALIZE WORD SENSE DISAMBIGUATION: MAXIMIZING EFFICIENCY BY MINIMIZING COMPUTATIONAL COSTS.

Author: CANTOS, PASCUAL, SÁNCHEZ, AQUILINO, and ALMELA, MOISÉS
Subjects: *ALGORITHMS, *COMPUTERS in lexicology, *PARSING (Grammar), *PARSING (Computer grammar), *APPLIED linguistics, *SEMANTICS, *MATHEMATICAL models
Abstract: This paper presents an algorithm based on collocational data for word sense disambiguation (WSD). The aim of this algorithm is to maximize efficiency by minimizing (1) computational costs and (2) linguistic tagging/annotation. The formalization of our WSD algorithm is based on discriminant function analysis (DFA). This statistical technique allows us to parametrize each collocational item with its meaning, using just bare text. The parametrized data allow us to classify cases (sentences with an ambiguous word) into the values of a categorical dependent (each of the meanings of the ambiguous word). To evaluate the validity and efficiency of our WSD algorithm, we previously hand sense-tagged all the sentences containing ambiguous words and then cross-validated the hand sense-tagged data with the automatic WSD performance. Finally, we present the global results of our algorithm after applying it to a limited set of words in both languages: Spanish and English, highlighting the points we consider relevant for further analysis. [ABSTRACT FROM AUTHOR]
Published: 2009

43. Integrating Linguistic Resources in TC through WSD.

Author: Ureña-López, L. Alfonso, Buenaga, Manuel, and Gómez, José M.
Subjects: *LINGUISTICS, *LANGUAGE & languages, *DATABASES, *INFORMATION storage & retrieval systems, *ELECTRONIC data processing, *COMPUTERS
Abstract: Information access methods must be improved to overcome the information overload that most professionals face nowadays. Text classification tasks, like Text Categorization, help the users to access to the great amount of text they find in the Internet and their organizations. TC is the classification of documents into a predefined set of categories. Most approaches to automatic TC are based on the utilization of a training collection, which is a set of manually classified documents. Other linguistic resources that are emerging, like lexical databases, can also be used for classification tasks. This article describes an approach to TC based on the integration of a training collection (Reuters-21578) and a lexical database (WordNet 1.6) as knowledge sources. Lexical databases accumulate information on the lexical items of one or several languages. This information must be filtered in order to make an effective use of it in our model of TC. This filtering process is a Word Sense Disambiguation task. WSD is the identification of the sense of words in context. This task is an intermediate process in many natural language processing tasks like machine translation or multilingual information retrieval. We present the utilization of WSD as an aid for TC. Our approach to WSD is also based on the integration of two linguistic resources: a training collection (SemCor and Reuters-21578) and a lexical database (WordNet 1.6). We have developed a series of experiments that show that: TC and WSD based on the integration of linguistic resources are very effective; and, WSD is necessary to effectively integrate linguistic resources in TC. [ABSTRACT FROM AUTHOR]
Published: 2001
Full Text: View/download PDF

44. Tagger Evaluation Given Hierarchical Tag Sets.

Author: Melamed, I. Dan and Resnik, Philip
Subjects: *METHODOLOGY, *COMPUTATIONAL linguistics, *SEMANTICS, *LANGUAGE & languages, *VERBS
Abstract: We present methods for evaluating human and automatic taggers that extend current practice in three ways. First, we show how to evaluate taggers that assign multiple tags to each test instance, even if they do not assign probabilities. Second, we show how to accommodate a common property of manually constructed ``gold standards'' that are typically used for objective evaluation, namely that there is often more than one correct answer. Third, we show how to measure performance when the set of possible tags is tree-structured in an IS-A hierarchy. To illustrate how our methods can be used to measure inter-annotator agreement, we show how to compute the kappa coefficient over hierarchical tag sets. [ABSTRACT FROM AUTHOR]
Published: 2000
Full Text: View/download PDF

45. Introducing "α-Sustainable Development" for transforming our world: A proposal for the 2030 agenda.

Author: Biswas, Siddhartha Sankar, Ahad, Mohd Abdul, Nafis, Md Tabrez, Alam, M. Afshar, and Biswas, Ranjit
Subjects: *GOVERNMENT agencies, *FUZZY measure theory, *SET theory, *FUZZY sets, *SUSTAINABLE development
Abstract: In this paper the authors introduce the "Theory of α-Sustainable Development" which is developed using fuzzy set theory. The existing concept and definition of "Sustainable Development" is correct and well understood by the world, but is not very precisely defined. In our "Theory of α-Sustainable Development" we say that every development (D) is a sustainable development up to certain extent depending upon the fuzzy measure 'α' of the amount of sustainability of D. We propose double categorization of every hybrid-pillar (HP) Development for better evaluation: Categorization in Type-1 and Categorization in Type-2. For a single-pillar development D the value of the fuzzy measure α is a non-negative number in [0,1], and the same is also true for a hybrid pillar (2P or 3P) development D while categorizing under Type-2. However, while categorizing the HP-Developments under Type-1, the value of α is an ordered pair (α 1 , α 2) where α 1 and α 2 are in [0,1]. Every development D is graded to be qualified as one of the five categories of sustainable development: SSD, GSD, PSD, WSD and NSD. In 'Theory of α-Sustainable Development' the existing core notion of 'Sustainable Development' is neither compromised nor diluted. The theory initially discusses about 'single-pillar Sustainable Development' and then about 'multi-pillar Sustainable Development' (also called by Hybrid Pillar Sustainable Development). The notion of 'α-Sustainable Development' will be very much useful to the corresponding regulatory bodies to improve the practices of dealing a development D with the existing notion of Yes-No type of sustainable development. Mathematically, the core aim of this paper is to grade a development D in the continuous range [0,1] instead of the existing notion of grading in the discrete range {0, 1}. This work deals only with those type of developments which are completely constructed, not about reviewing the interim progress of them. The "Theory of α-Sustainable Development" is an open proposal to the 2030 Agenda meeting, and will surely enrich the policy of ESD for educating the present and future generation people and thus to retain the sustainability of our planet as a whole. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

46. A sequential assessment of WSD risk factors of shrimp farming in Bangladesh: Looking for a sustainable farming system.

Author: Hasan, Neaz A., Haque, Mohammad Mahfujul, Hinchliffe, Steve J., and Guilder, James
Subjects: *SHRIMP culture, *SUSTAINABLE agriculture, *RANGE management, *RISK assessment, *FARM management, *POULTRY farms
Abstract: White Spot Disease (WSD) caused by White Spot Syndrome Virus (WSSV), is responsible for widespread mortality and economic losses across almost the entire Asian shrimp farming industry. The distribution of disease prevalence is however uneven, and is likely dependent on a range of management, environmental and socio-ecological factors. In this study, 233 farms were surveyed in southwest Bangladesh, the main shrimp farming zone, to produce a dataset from a range of pond types, culture techniques and farming practices. Four categories of data (site/farm characteristics, environmental variables, disease history, and management variables) with associated risk factors were selected following the development of a conceptual framework and a participatory rural appraisal tool. Factors potentially contributing to WSD prevalence in the current shrimp crop were first screened using univariate analysis and subsequently analyzed using a multivariate logistic regression to highlight significant risk factors. Association of the selected factors with WSD prevalence was examined using multivariate stepwise removal. The multivariate analysis revealed that farms operated by a tenant worker (p: 0.03), mixed use of fertilizer (p: 0.009), poor quality water source (p: 0.001), lack of reservoir for water purification (p: < 0.001), and frequent exchange of water during a single crop culture (p: < 0.001) were significantly associated with WSD prevalence. The results suggest that, where possible, better farm management practices including improving water quality, controlling water exchange and/or maintaining constant salinity, will reduce WSD prevalence. Unlabelled Image • WSD is responsible for widespread mortality and economic losses in shrimp farming. • Multivariate analysis of risk factors linked to shrimp farming (N = 233) were done. • Tenant worker operated farms using mixed fertilizers were more susceptible to WSD. • Moreover, poor quality, frequent exchange and lack of reservoir of water caused WSD. • Quality source, controlled exchange and constant salinity of water can reduce WSD. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

47. Dataset of white spot disease affected shrimp farmers disaggregated by the variables of farm site, environment, disease history, operational practices, and saline zones.

Author: Hasan NA and Haque MM
Abstract: The article presents the summary of a dataset related to the risks factors of white spot disease (WSD) of farmed shrimp ( Penaeus monodon ) in Khulna, Bagerhat and Satkhira districts of Bangladesh. This dataset was developed following two consecutive steps. In the first step, participatory rural appraisal tools were applied to get the conceptual framework for data collection regarding lists of farmers and the variables of the risk factors of WSD. In the second step, sampling of farmers, google featured questionnaire development, and mobile phone-assisted survey were carried out. The total surveyed farms were 233 consisting of 21 and 212 semi-intensive and extensive farms, respectively. The data were collected in the form of continuous, nominal and binary variables disaggregated by saline zones. The dataset contains some basic socio-economic data of shrimp farmers, farm characteristics, environmental attributes and disease history of shrimp farms. The dataset also has GPS coordinates of all the surveyed farms individually which are very useful for spatial analysis. In total, the dataset in MS Excel has 46 variables and attached as the supplementary material with this article., Competing Interests: The authors declare that they have no known competing for financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article., (© 2020 The Authors.)
Published: 2020
Full Text: View/download PDF

48. Using the TEI Writing System Declaration (WSD)

Author: Birnbaum, David J., Cournane, Mavis, and Flynn, Peter
Published: 1999
Full Text: View/download PDF

49. Identification and characterization of FaSOC1, a homolog of SUPPRESSOR OF OVEREXPRESSION OF CONSTANS1 from strawberry.

Author: Lei HJ, Yuan HZ, Liu Y, Guo XW, Liao X, Liu LL, Wang Q, and Li TH
Subjects: Amino Acid Sequence, Arabidopsis genetics, Arabidopsis Proteins classification, Arabidopsis Proteins isolation & purification, Cloning, Molecular, Flowers genetics, Flowers metabolism, Fragaria metabolism, Gene Expression Regulation, Developmental, Gene Expression Regulation, Plant, MADS Domain Proteins classification, MADS Domain Proteins isolation & purification, Molecular Sequence Data, Phylogeny, Plant Proteins genetics, Plant Proteins isolation & purification, Plants, Genetically Modified, Sequence Analysis, DNA, Tissue Distribution, Arabidopsis Proteins genetics, Fragaria genetics, MADS Domain Proteins genetics, Sequence Homology
Abstract: A MADS-box gene SUPPRESSOR OF OVEREXPRESSION OF CONSTANS1 (SOC1) integrates multiple flowering signals to regulate floral transition in Arabidopsis. Strawberry (Fragaria spp.) is an economically important fruit crop, but its molecular control of flowering is largely unknown. In this study, a SOC1-like gene, FaSOC1, was isolated and characterized from strawberry. The open reading frame of FaSOC1 was 648bp, encoding a protein of 215 amino acids. Sequence alignment and phylogenetic analysis showed that the FaSOC1 protein contained a highly conserved MADS domain and a SOC1 motif, and that it was a member of the SOC1-like genes of dicots. The FaSOC1 protein mainly localized in the cytoplasm of onion epidermal cells and Arabidopsis protoplasts, and showed no transcriptional activation activity in yeast cells. Under the floral induction conditions, the expression of FaSOC1 increased during the first 2weeks of short-day treatment, but declined dramatically during three to 4weeks. FaSOC1 was highly expressed in reproductive organs, including shoot apices, floral buds, flowers, stamens and sepals. Overexpression of FaSOC1 in wild-type Arabidopsis caused early flowering and upregulated the expression of flowering time genes LFY and AP1. In addition, the yeast two-hybrid and BiFC assays confirmed that FaSOC1 could interact with AGL24. In conclusion, these results suggest that FaSOC1 is a flowering promoter in strawberry., (© 2013.)
Published: 2013
Full Text: View/download PDF

50. Comparative analysis of conventional PCR and real-time PCR to diagnose shrimp WSD.

Author: Leal CA, Carvalho-Castro GA, Cottorello AC, Leite RC, and Figueiredo HC
Subjects: Animals, Sensitivity and Specificity, Molecular Diagnostic Techniques methods, Penaeidae virology, Real-Time Polymerase Chain Reaction methods, Veterinary Medicine methods
Abstract: The aims of this study were to standard and optimize a qPCR protocol with FAM-BHQ1 probe, and to compare its sensitivity against TaqMan qPCR and PCR methods to diagnose shrimp WSD. The FAM-BHQ1 qPCR presented higher clinical sensitivity and showed to be a robust alternative to detect WSSV in clinical samples.
Published: 2013
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

76 results on '"WSD"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources