Descriptor: "Root (linguistics)" / Topic: computer - Searchworks@Jio Institute Digital Library Search Results

1. What Causes Wrong Sentiment Classifications of Game Reviews?

Author: Dayi Lin, Markos Viggiato, Cor-Paul Bezemer, and Abram Hindle
Subjects: Root (linguistics), Game genre, Point (typography), Computer science, business.industry, Sentiment analysis, computer.software_genre, Artificial Intelligence, Control and Systems Engineering, Scale (social sciences), Classifier (linguistics), Artificial intelligence, Overall performance, Electrical and Electronic Engineering, business, Game Developer, computer, Software, Natural language processing
Abstract: Sentiment analysis is a popular technique to identify the sentiment of a piece of text. Although several techniques have been proposed, the performance of current sentiment analysis techniques are still far from acceptable and the causes of wrong classifications are not clear. In this paper, we study how sentiment analysis performs on game reviews. We report the results of a large scale study on the performance of widely-used sentiment analysis classifiers on game reviews. Then, we investigate the root causes for misclassifications and quantify the impact of each cause on the overall performance. We study three existing classifiers: Stanford CoreNLP, NLTK, and SentiStrength. Our results show that most classifiers do not perform well on game reviews, with the best one being NLTK (with an AUC of 0.70). We also identified four main causes for wrong classifications, such as reviews that point out advantages and disadvantages of the game, which might confuse the classifier. The identified causes are not trivial to be resolved and our suggestion to game developers is to prioritize the causes with higher impact on the sentiment classification performance. Finally, we show that training sentiment classifiers on reviews that are stratified by the game genre is effective.
Published: 2022
Full Text: View/download PDF

2. Same Root, Different Categories: Encoding Direction in Chinese

Author: Xuhui Hu
Subjects: Linguistics and Language, Root (linguistics), business.industry, Encoding (semiotics), Artificial intelligence, Predicate (mathematical logic), business, computer.software_genre, computer, Language and Linguistics, Natural language processing, Linguistics, Mathematics
Abstract: The complexity of the directional construction in Chinese involves the following factors: (a) it can take a single directional item as the predicate; (b) two directional items can cooccur to serve as the predicate; (c) one or two directional items can be attached to a matrix verb in a single clause; (d) if more than one directional item is involved, their linear order can differ. I propose that the leading factor behind this complexity is that in Chinese a single Root can take different categories when merged in different syntactic positions. Therefore, the same directional item may in fact have the phonological form of a verb, a preposition, a part of a single preposition, or even a spatial aspectual marker in different directional constructions. I then place this account within the context of parametric studies of motion event constructions, showing that two new dimensions can be added: the special property of Roots in a language and the existence of Spatial Aspect in at least some languages.
Published: 2022
Full Text: View/download PDF

3. Challenges in organisation of college-based departmentalised continuous professional development in colleges of education for academic staff: A case study in selected colleges of education in the Volta Region

Author: Atiku Benedicta Awusi
Subjects: Semi-structured interview, Root (linguistics), Medical education, Continuing professional development, Qualitative design, Sustainability, ComputingMilieux_COMPUTERSANDEDUCATION, Sociology, Root cause, Thematic analysis, TUTOR, computer, computer.programming_language
Abstract: Continuous professional development for workers is a sure way of keeping staff with up-to-date knowledge, skills and values for effective and efficient delivery on their jobs. The case of colleges of education, the trainers of teachers, cannot be underrated. Every major decision taken by the tutor on what to teach and how to teach it, when to teach could be positively impacted if the tutor is updated on trends of events at the sector. Even though the realization of importance of CPD is getting its root among stakeholders, a number of factors militate against successful hosting of the programmes. This research was therefore conducted in 2017 to unearth the challenges and to help discover their root cause. The study employed case study approach under qualitative design. Data were collected from four principals through semi structured interviews whilst four vice principals and twelve college tutors participated through open-ended questionnaire. Manual coding was adopted under thematic analysis. It was found among other things that training logistics /inadequate funding, limited time, poor attitude due to resistant to change among staff members, inadequate supervision and motivation are the challenges confronting college-based departmentalised programmes. It was concluded that multi factors come to play in challenging the success of any CPD programme organised for the staff of the colleges of education. It recommended that a holistic approach is adopted through involvement of all stakeholders and a comprehensive study initiated into ways of combating the challenges in order to ensure sustainability. Key words: Continuous professional development, college tutors, challenges, college management.
Published: 2021
Full Text: View/download PDF

4. STEMUR: An Automated Word Conflation Algorithm for the Urdu Language

Author: AnwarMuhammad Waqas, IslamRaees Ul, JamalM. Hasan, FatimaTayyaba, ChaudhryM. Tayyab, and GillaniZeeshan
Subjects: Root (linguistics), General Computer Science, Computer science, business.industry, Conflation, computer.software_genre, language.human_language, Term (time), Prefix, Common word, language, Artificial intelligence, Urdu, business, computer, Natural language processing, Word (computer architecture)
Abstract: Stemming is a common word conflation method that perceives stems embedded in the words and decreases them to their stem (root) by conflating all the morphologically related terms into a single term, without doing a complete morphological analysis. This article presents STEMUR, an enhanced stemming algorithm for automatic word conflation for Urdu language. In addition to handling words with prefixes and suffixes, STEMUR also handles words with infixes. Rather than using a totally unsupervised approach, we utilized the linguistic knowledge to develop a collection of patterns for Urdu infixes to enhance the accuracy of the stems and affixes acquired during the training process. Additionally, STEMUR also handles English loan words and can handle words with more than one affix. STEMUR is compared with four existing Urdu stemmers including Assas-Band and the template-based stemmer that are also implemented in this study. Results are processed on two corpora containing 89,437 and 30,907 words separately. Results show clear improvements regarding strength and accuracy of STEMUR. The use of maximum possible infix rules boosted our stemmer's accuracy up to 93.1% and helped us achieve a precision of 98.9%.
Published: 2021
Full Text: View/download PDF

5. Fine-grained attribute weighted inverted specific-class distance measure for nominal attributes

Author: Xin Wang, Liangxiao Jiang, Gong Fang, Dianhong Wang, and Seyyed Mohammadreza Rahimi
Subjects: Root (linguistics), Information Systems and Management, Mean squared error, Statistical assumption, Computer science, 02 engineering and technology, Random walk, computer.software_genre, Class (biology), Measure (mathematics), Computer Science Applications, Theoretical Computer Science, Weighting, Noise, Artificial Intelligence, Control and Systems Engineering, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, computer, Software
Abstract: The inverted specific-class distance measure (ISCDM) ranks first in the list of distance metrics that deal solely with nominal attributes, especially when values are missing and noise exists in non-class attributes. However, the attribute independence assumption still inevitably exists, which is almost untenable in many real-world applications with sophisticated attribute dependencies. Many improved versions based on different attribute weighting schemes have been proposed to relax this unrealistic assumption and circumvent its damage to measuring performance. However, existing attribute weighting schemes are limited to assigning a weight corresponding to each attribute; they ignore the more fine-grained dependence relationships between attributes and classes. Thus, in this study, we derive a novel fine-grained attribute weighting scheme, which first calculates the initial fine-grained attribute weights according to different attribute values and class labels and then uses random walk with restart to optimize them. We titled our improved measure fine-grained attribute-weighted ISCDM (FAWISCDM). Extensive experimental results on 66 datasets from a machine learning repository, collected by the University of California at Irvine illustrate that the FAWISCDM is notably superior to the original ISCDM and some other state-of-the-art competing methods, in terms of the negative conditional log likelihood and root relative squared error.
Published: 2021
Full Text: View/download PDF

6. STEMMING BAHASA JAWA MENGGUNAKAN DAMERAU LEVENSHTEIN DISTANCE (DLD)

Author: Muhammad Nu’man Hakim and Aji Prasetya Wibawa
Subjects: Root (linguistics), Computer science, business.industry, Value (computer science), String searching algorithm, computer.software_genre, Prefix, Damerau–Levenshtein distance, Word repetition, Artificial intelligence, Suffix, business, computer, Natural language processing, Word (computer architecture)
Abstract: Stemming is one of the essential stages of text mining. This process removes prefixes and suffixes to produce root words in a text. This study uses a string matching algorithm, namely Damerau Levenshtein Distance (DLD), to find the basic word forms of Javanese. Test data of 300 words that have a prefix, insertion, suffix, a combination of prefix and suffix, and word repetition. The results of this study indicate that the Damerau Levenshtein Distance (DLD) algorithm can be used for Stemming Javanese text with an accuracy value of 49.6%.
Published: 2021
Full Text: View/download PDF

7. Assessing Confidence in Root Placement on Phylogenies: An Empirical Study Using Nonreversible Models for Mammals

Author: Bui Quang Minh, Suha Naser-Khdour, and Robert Lanfear
Subjects: 0106 biological sciences, Root (linguistics), Computer science, Biology, Machine learning, computer.software_genre, Markov model, 010603 evolutionary biology, 01 natural sciences, 03 medical and health sciences, Empirical research, Genetics, Animals, Phylogeny, Ecology, Evolution, Behavior and Systematics, 030304 developmental biology, computer.programming_language, Mammals, 0303 health sciences, Measure (data warehouse), Models, Genetic, Phylogenetic tree, business.industry, Python (programming language), Tree (data structure), Outgroup, Artificial intelligence, business, computer
Abstract: Using time-reversible Markov models is a very common practice in phylogenetic analysis, because although we expect many of their assumptions to be violated by empirical data, they provide high computational efficiency. However, these models lack the ability to infer the root placement of the estimated phylogeny. In order to compensate for the inability of these models to root the tree, many researchers use external information such as using outgroup taxa or additional assumptions such as molecular-clocks. In this study, we investigate the utility of non-reversible models to root empirical phylogenies and introduce a new bootstrap measure, the rootstrap, which provides information on the statistical support for any given root position.Availability and implementationrootstrap support is implemented in IQ-TREE 2 and a tutorial is available at the iqtree webpage http://www.iqtree.org/doc/Rootstrap. In addition, a python script is available at https://github.com/suhanaser/Rootstrap. [phylogenetic inference, root estimation, bootstrap, non-reversible models]
Published: 2021
Full Text: View/download PDF

8. On the comparative method, internal reconstruction, and other analytical tools for the reconstruction of the evolution of the Basque language: An assessment

Author: José Ignacio Hualde
Subjects: Root (linguistics), internal reconstruction, Computer science, business.industry, Comparative method, Internal reconstruction, P1-1091, Phonology, General Medicine, Old Common Basque, computer.software_genre, Proto-Basque, Artificial intelligence, business, Philology. Linguistics, computer, Natural language processing
Abstract: This paper is an attempt to present the state of the art in Basque historical phonology. The accomplishments and limitations of different methodologies are evaluated. These methodologies include the application of the comparative method to Basque dialects, the analysis of old borrowings in Michelena’s work, internal reconstruction, and Lakarra’s canonical root hypothesis. I also discuss the possibilities afforded by internal reconstruction and root theory for discovering genetic relationships between Basque and other languages, focusing on recent proposals.
Published: 2021
Full Text: View/download PDF

9. Microbiological Evaluation Of Root Canals After Biomechanical Preparation With Manual And Rotary File System - Randomised Clinical Trial

Author: Pavithiraa Sankar
Subjects: Clinical trial, File system, Root (linguistics), business.industry, Medicine, Dentistry, computer.software_genre, business, General Dentistry, computer
Published: 2021
Full Text: View/download PDF

10. Extraction strategies in Norwegian

Author: Christine Meklenborg
Subjects: Foot (prosody), Language. Linguistic theory. Comparative grammar, P101-410, Root (linguistics), Computer science, business.industry, Movement (music), Novelty, Norwegian, computer.software_genre, language.human_language, language, Artificial intelligence, business, computer, Natural language processing
Abstract: In this paper I will show that it is possible to extract elements from an embedded root clause in a V2 language, provided that the deleted copy is spelled out in a high position. However, if the embedded clause does not have V0-to-C0 movement, no deleted copy can be spelled out. This difference falls out naturally from the assumption that embedded root clauses must be thematically complete and that in the case of movement chains, the foot of the chain cannot be spelled out. This paper is a detailed study of extraction strategies in Norwegian, based on a corpus of 1329 informants. Its novelty lies in combining the study of extraction strategies with the presence of resumptive elements in the embedded clause.
Published: 2021
Full Text: View/download PDF

11. Employee Performance Evaluation Using Sentiment Analysis

Author: Badi Alekhya, Dr.R. Sasikumar, K. Harshita, I. Kamakshi Pravallika, and O.S. Hema Sree
Subjects: Root (linguistics), Computer science, business.industry, Process (engineering), Semantic analysis (machine learning), Sentiment analysis, Context (language use), computer.software_genre, Field (computer science), Artificial intelligence, Representation (mathematics), business, computer, Natural language processing, Theme (narrative)
Abstract: Emotional analysis and data mining has become a hot topic in the field of data mining and natural language analysis as a solidly typed mining activity to analyze the concept of objects (i.e., emotion) expressed in the text. Emotional analysis is an important step in the recommendation process, because it allows you to separate the sense of the root context (e.g., positive or negative). In emotional analysis, the word-of-word (BOW) model is widely used in text classification, similar to how it is used in the modeling of a traditional theme. These two anti-emotional texts are considered very similar to the BOW representation. That is why, as a result of polarity change, machine learning methods often fail. We recommend combining a semantic analysis program with a separator to evaluate work results.
Published: 2021
Full Text: View/download PDF

12. Association between Test Item's Length, Difficulty, and Students' Perceptions: Machine Learning in Schools' Term Examinations

Author: Yu-Ling Lu, Chi-Jui Lien, and Chin-San Lin
Subjects: Root (linguistics), education.field_of_study, Item analysis, business.industry, Population, Word count, Machine learning, computer.software_genre, Education, Term (time), Categorization, Artificial intelligence, education, Association (psychology), business, Psychology, computer, Research question
Abstract: The study applies machine learning (ML) algorithms to investigate the association between the length of a test item written in Chinese (through word count), item difficulty, and students' item perceptions (IPs) in science term examinations. For Research Question 1, items for grade 7 students aged 12–13 in a Taiwanese secondary school from 2014 to 2019 were analyzed. For Research Question 2, the study included 4,916 students from the said population. For RQ3, perceptions were gathered from 48 students of the same school in 2020. The study's results showed that first, the average word count of the 611 items was 88.81, with an average stem word count of 41.16, average options word count of 47.66, and stem-to-options word count ratio (S-O ratio) of 1.27. Second, given that the ML M5P categorization algorithm affirms the items' predictive power, the length of an item is a key factor in determining its difficulty. As a result of this algorithm, 3 categories of the length of science term examination items were classified ( 91.5 words), and 3 linear prediction models of item difficulty (LM1, LM2, and LM3) were generated. From these models, it was found that as the length of an item increases, so does its difficulty. In the prediction analysis of students' IPs, the J48 prediction result was better and convertible into understandable rules. IP was the root node of the decision rule, indicating the importance of this variable. Therefore, students would more likely answer an item correctly when 1) it was perceived to be easy or normal, 2) the students had high or ordinary learning achievements in science, and 3) it contained below 71 words. The study's results can be used as a reference for educators, examiners, and researchers in practical science term examination design. Moreover, it can guide the further research method and direction of applying machine learning in analyzing the difficulty of items in scientific assessments.
Published: 2021
Full Text: View/download PDF

13. Root Cause Analysis Based on Relations Among Sentiment Words

Author: Young-Gab Kim and Sang-Min Park
Subjects: Root (linguistics), Relation (database), Computer science, business.industry, Cognitive Neuroscience, Sentiment analysis, Contrast (statistics), 02 engineering and technology, Root cause, computer.software_genre, Computer Science Applications, Hierarchical clustering, 03 medical and health sciences, Identification (information), 0302 clinical medicine, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Root cause analysis, computer, 030217 neurology & neurosurgery, Natural language processing
Abstract: Sentiment analysis is a useful method to extract user preferences from product reviews; however, it cannot explain the detailed reasons for user preferences because of the exclusion of neutral sentiment words, constituting a large proportion of the words used in reviews. In contrast, there are limitations to using root cause analysis to analyze sentiment relations using sentiment words extracted from user preferences. This research aimed to extract a more fine-grained root cause by proposing a novel method capable of analyzing the root cause based on the relations between sentiment words. To identify the root causes of negative opinions in aspect-level sentiment analysis, we analyze the hierarchical and causal relations between sentiment triples and utilize hierarchical clustering based on sentiment triples’ relation to compensate for general sentiment words. The experimental results showed that the proposed method was 6.4% and 5.1% more accurate than the existing aspect-level analysis for the mobile device and clothing domains, respectively. Finally, we discussed some issues associated with the proposed method using a qualitative evaluation. In this study, a novel root cause identification method that can utilize the hierarchical and causal relations between sentiment words using negative and neutral sentiment expressions of product reviews is proposed.
Published: 2021
Full Text: View/download PDF

14. Selection of nitrogen responsive root architectural traits in spinach using machine learning and genetic correlations

Author: Henry O. Awika, Carlos A. Avila, Vijay Joshi, Amit Kumar Mishra, Haramrit Gill, and James DiPiazza
Subjects: 0106 biological sciences, 0301 basic medicine, Root (linguistics), Nitrogen, Science, Machine learning, computer.software_genre, 01 natural sciences, Plant Roots, Article, Machine Learning, 03 medical and health sciences, Nutrient, Quantitative Trait, Heritable, Spinacia oleracea, Biomass, Selection (genetic algorithm), Biomass (ecology), Multidisciplinary, biology, business.industry, Heritability, biology.organism_classification, Environmental sciences, 030104 developmental biology, Phenotype, Seedlings, Shoot, Trait, Spinach, Medicine, Artificial intelligence, business, Plant sciences, computer, 010606 plant biology & botany
Abstract: The efficient acquisition and transport of nutrients by plants largely depend on the root architecture. Due to the absence of complex microbial network interactions and soil heterogeneity in a restricted soilless medium, the architecture of roots is a function of genetics defined by the soilless matrix and exogenously supplied nutrients such as nitrogen (N). The knowledge of root trait combinations that offer the optimal nitrogen use efficiency (NUE) is far from being conclusive. The objective of this study was to define the root trait(s) that best predicts and correlates with vegetative biomass under differed N treatments. We used eight image-derived root architectural traits of 202 diverse spinach lines grown in two N concentrations (high N, HN, and low N, LN) in randomized complete blocks design. Supervised random forest (RF) machine learning augmented by ranger hyperparameter grid search was used to predict the variable importance of the root traits. We also determined the broad-sense heritability (H) and genetic (rg) and phenotypic (rp) correlations between root traits and the vegetative biomass (shoot weight, SWt). Each root trait was assigned a predicted importance rank based on the trait’s contribution to the cumulative reduction in the mean square error (MSE) in the RF tree regression models for SWt. The root traits were further prioritized for potential selection based on the rg and SWt correlated response (CR). The predicted importance of the eight root traits showed that the number of root tips (Tips) and root length (RLength) under HN and crossings (Xsings) and root average diameter (RAvdiam) under LN were the most relevant. SWt had a highly antagonistic rg (− 0.83) to RAvdiam, but a high predicted indirect selection efficiency (− 112.8%) with RAvdiam under LN; RAvdiam showed no significant rg or rp to SWt under HN. In limited N availability, we suggest that selecting against larger RAvdiam as a secondary trait might improve biomass and, hence, NUE with no apparent yield penalty under HN.
Published: 2021

15. A Multiagent-Based Methodology for Known and Novel Faults Diagnosis in Industrial Processes

Author: Mohamed El Koujok, Hakim Ghezzaz, Mouloud Amazouz, and Ahmed Ragab
Subjects: Structure (mathematical logic), Root (linguistics), Decision support system, Computer science, business.industry, Process (engineering), 020208 electrical & electronic engineering, 02 engineering and technology, Fault (power engineering), Machine learning, computer.software_genre, Fault detection and isolation, Computer Science Applications, Support vector machine, Control and Systems Engineering, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), Artificial intelligence, Electrical and Electronic Engineering, business, computer, Information Systems
Abstract: This article proposes a multiagent-based methodology for the real-time fault diagnosis in industrial processes. This articles aims to build a decision support tool that helps process operators identify and better manage abnormal situations. The supervised and semisupervised machine learning methods are widely used to develop such tools. Despite their accuracy in classifying faults, supervised methods have a major limitation: they cannot diagnose novel faults. The semisupervised methods can detect and isolate novel faults but cannot disclose their root causes. The proposed methodology combines both supervised and semisupervised methods in a parallel–serial structure, exploiting their respective strengths. Moreover, it provides the process expert with the meaningful explanations of the detected novel faults or otherwise. Two case studies are used in this article to demonstrate the effectiveness of the proposed methodology. The first case is the Tennessee Eastman process benchmark. The second one uses the real data collected from a heat recovery system in a thermomechanical pulp mill.
Published: 2021
Full Text: View/download PDF

16. What We Have Here Is Failure to Validate: Summer of LangSec

Author: Sameed Ali, Prashant Anantharaman, Sean W. Smith, and Zephyr Lucas
Subjects: Root (linguistics), 021103 operations research, Grammar, Exploit, Computer Networks and Communications, computer.internet_protocol, Computer science, business.industry, media_common.quotation_subject, 0211 other engineering and technologies, 02 engineering and technology, Computer security, computer.software_genre, Internet protocol suite, The Internet, Electrical and Electronic Engineering, business, Law, computer, media_common
Abstract: This article reviews several exploits of input-handling vulnerabilities and categorizes these vulnerabilities based on their root causes. We describe the language-theoretic security paradigm and discuss how it can be used to tackle these categories of vulnerabilities.
Published: 2021
Full Text: View/download PDF

17. Psychological Response in Fire: A Fuzzy Bayesian Network Approach Using Expert Judgment

Author: Nurulhuda Ramli, Nazihah Ahmad, Intan Hashimah Mohd Hashim, and Noraida Abdul Ghani
Subjects: 040101 forestry, Root (linguistics), Measure (data warehouse), Psychological response, Computer science, business.industry, media_common.quotation_subject, Bayesian network, 020101 civil engineering, Expert elicitation, 04 agricultural and veterinary sciences, 02 engineering and technology, Machine learning, computer.software_genre, Fuzzy logic, 0201 civil engineering, Incomplete knowledge, Conceptual model, 0401 agriculture, forestry, and fisheries, General Materials Science, Artificial intelligence, Safety, Risk, Reliability and Quality, business, computer, media_common
Abstract: In modelling human behavior during a fire, one has to deal with uncertainties regarding the psychological response due to limited or incomplete knowledge database. The purpose of this paper is to develop a new fuzzy Bayesian Network framework to model causal relationship of psychological response at the initial stage of fire events. Firstly, a new conceptual model namely the PRiF (Psychological Response in a Fire) is developed through the literature of human behaviour in fire evacuation modelling and expert opinion approach. Then, the expert elicitation using fuzzy linguistic concept was adapted in quantifying the PRiF model. Finally, an example of expert elicitation study is demonstrated to illustrate the practical application of the proposed methodology. Results show that the proposed methodology is not only able to capture the sequence of psychological reactions in line with the theory of human behavior in a fire but also can quantitatively measure the likelihood of circumstances of possible undesired scenario, and identify the most influential factors or prioritize the root causes of unsuccessful safe evacuation.
Published: 2021
Full Text: View/download PDF

18. MTStemmer: A multilevel stemmer for effective word pre-processing in Marathi

Author: Virat Giri
Subjects: Root (linguistics), Grammar, Computer science, business.industry, General Mathematics, media_common.quotation_subject, Context (language use), computer.software_genre, Automatic summarization, language.human_language, Education, Data modeling, Computational Mathematics, Computational Theory and Mathematics, Auxiliary verb, language, Artificial intelligence, Marathi, business, computer, Natural language processing, Word (computer architecture), media_common
Abstract: In natural language processing, it is important that the context and the meaning of words are retained while also ensuring the efficacy of the data modelling process. During human-to-human interactions, special care is taken regarding the tense and phrasing of the words by taking into consideration the rules of grammar of the specific language. While this modification of words is necessary for framing consistent sentences, these appendages do not add significant value to the original meaning of the word. Stemming is the process of converting words back to their root form for efficient and accurate modelling of the data. In this paper, MTStemmer, a new stemmer for the Marathi language is proposed. It focuses on the stripping of suffixes for obtaining the root word form. The proposed stemmer applies a multilevel approach by taking into consideration both auxiliary verb-based suffixes and gender-based suffixes. The presented approach intends to improve upon the limitations of the previously proposed stemmers for this language. The stemming performed by the stemmer is found to be more accurate in terms of mapping to the root words. Stemming is often an important pre-processing step before processing the data further for the main task. The benefit of the proposed stemmer is demonstrated by using it for an extractive Marathi text summarization task. A significant improvement in the performance of multiple performance metrics is achieved owing to the stemming done by MTStemmer. The working of the proposed stemmer shows promising signs for the development of similar engines for other Indic languages.
Published: 2021
Full Text: View/download PDF

19. Blend formation in Turkish Sign Language: Are we missing the big picture?

Author: Bahtiyar Makaroğlu
Subjects: Linguistics and Language, Root (linguistics), business.industry, Phonology, Word formation, Sign language, computer.software_genre, Language and Linguistics, Education, Morpheme, Artificial intelligence, Computational linguistics, business, Psychology, computer, Modality (semiotics), Natural language processing, ComputingMethodologies_COMPUTERGRAPHICS, Sign (mathematics)
Abstract: From the point of word formation, the phenomenon of lexical blending is a common productive process, entailing the notion of combination of lexemes in so many languages. In the vast majority of literature on blends, they preserve a linear formation of segments with a shortening of both lexemes. However, in sign languages where morphological categories are mainly encoded by non-concatenative morphology, signed blends can be created by the general mechanism of templatic structures, the combination of lexical bases into a non-linear sequence. Specifically, the main purposes in this study are (i) to provide a comprehensive definition of blending formation in signed modality, (ii) to determine whether there are any structural regularities in the formation of lexical blends in Turkish Sign Language (TA°D), and (iii) to classify TA°D blends according to well-defined criteria. The corpus data to be studied currently include 109 blending formations. Overall, the results demonstrate that TA°D data has familiar properties of blends (named complete blends here) in established spoken languages, as well as modality-specific types of root, simultaneous and initialized blends. We propose a modality-specific categorization, in which blend formation is not limited to linear organization and actual source words.
Published: 2021
Full Text: View/download PDF

20. Addressing Limited Vocabulary and Long Sentences Constraints in English–Arabic Neural Machine Translation

Author: Azzeddine Mazroui and Safae Berrichi
Subjects: Root (linguistics), Vocabulary, Multidisciplinary, Machine translation, Computer science, business.industry, media_common.quotation_subject, 010102 general mathematics, computer.software_genre, 01 natural sciences, Market segmentation, Factor (programming language), Segmentation, Quality (business), Artificial intelligence, 0101 mathematics, business, computer, Natural language processing, computer.programming_language, Lemma (morphology), media_common
Abstract: Neural Machine Translation (NMT) has attracted growing interest in recent years for its promising performance compared to traditional approaches such as Statistical Machine Translation. However, its application to languages having different structures, like the (English, Arabic) pair that interests us in this work, degrades its performance. Indeed, the limited vocabulary size required by the NMT models decreases the vocabulary coverage rate of the Arabic language, well known by its morphological richness. Likewise, long sentences present an additional challenge to NMT systems because they perform less well for longer sentences than for the shorter ones. In this paper, we provide a series of experiments to mitigate the effects of these constraints. To address the problem of out-of-vocabulary words, we integrated into factored NMT models morphosyntactic features as an output factor, namely stem, lemma, POS, root, and pattern. We have also developed two techniques for segmenting long sentences into smaller sub-sentences. The first uses a list of lexical markers that we have collected as segmentation points, and the second integrates into the NMT model the parallel phrases extracted by an SMT system. The experiments carried out on the English–Arabic pair show that the proposed approaches considerably improve the translation quality compared to the basic NMT system.
Published: 2021
Full Text: View/download PDF

21. Improving Semantic Coherence of Gujarati Text Topic Model Using Inflectional Forms Reduction and Single-letter Words Removal

Author: Apurva Shah and Uttam Chauhan
Subjects: Topic model, Root (linguistics), Vocabulary, General Computer Science, Computer science, business.industry, media_common.quotation_subject, computer.software_genre, Automatic summarization, Latent Dirichlet allocation, language.human_language, symbols.namesake, Text processing, language, symbols, Gujarati, Artificial intelligence, business, computer, Coherence (linguistics), Natural language processing, media_common
Abstract: A topic model is one of the best stochastic models for summarizing an extensive collection of text. It has accomplished an inordinate achievement in text analysis as well as text summarization. It can be employed to the set of documents that are represented as a bag-of-words, without considering grammar and order of the words. We modeled the topics for Gujarati news articles corpus. As the Gujarati language has a diverse morphological structure and inflectionally rich, Gujarati text processing finds more complexity. The size of the vocabulary plays an important role in the inference process and quality of topics. As the vocabulary size increases, the inference process becomes slower and topic semantic coherence decreases. If the vocabulary size is diminished, then the topic inference process can be accelerated. It may also improve the quality of topics. In this work, the list of suffixes has been prepared that encounters too frequently with words in Gujarati text. The inflectional forms have been reduced to the root words concerning the suffixes in the list. Moreover, Gujarati single-letter words have been eliminated for faster inference and better quality of topics. Experimentally, it has been proved that if inflectional forms are reduced to their root words, then vocabulary length is shrunk to a significant extent. It also caused the topic formation process quicker. Moreover, the inflectional forms reduction and single-letter word removal enhanced the interpretability of topics. The interpretability of topics has been assessed on semantic coherence, word length, and topic size. The experimental results showed improvements in the topical semantic coherence score. Also, the topic size grew notably as the number of tokens assigned to the topics increased.
Published: 2021
Full Text: View/download PDF

22. A Novel Quranic Search Engine Using an Ontology-Based Semantic Indexing

Author: Khaled Rezeg and Samia Zouaoui
Subjects: Root (linguistics), Multidisciplinary, Information retrieval, business.industry, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, 010102 general mathematics, Search engine indexing, Semantic search, computer.file_format, Ontology (information science), 01 natural sciences, Search engine, Index (publishing), SPARQL, 0101 mathematics, business, Precision and recall, computer
Abstract: Quran is the most significant religious document in the Arabic language in Islamic law. Several Quranic search engines have been designed and widely used for the past two decades. However, these search engines have certain limitations. For example, in many cases, the search is unable to retrieve relevant verses because it is based on keywords or root search and does not rely on the semantic relation between words in the query. The main objective of the present paper is to design a semantic search engine based on ontology as an index. In our work, we focus on creating a new ontology for Quranic document based on a set of useful words extracted from the Quranic Earab book with grammatical functions that serve as concepts. This ontology will be used as an index in information retrieval. The main idea is to create links between the existing Quranic words in the same verse, which will be used with user’s query to find the desired verses. We developed a graphical user interface with free and multiple inputs that convert users’ Arabic queries into SPARQL queries and then retrieve relevant verses from the ontology. The obtained results show that our proposal provides heightened precision and recall compared to other search engines.
Published: 2021
Full Text: View/download PDF

23. Index Term Selection Heuristics for Arabic Text Retrieval

Author: Yaser A. Al-Lahham
Subjects: Root (linguistics), Multidisciplinary, Basis (linear algebra), business.industry, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, 010102 general mathematics, computer.software_genre, 01 natural sciences, Prefix, Index (publishing), Simple (abstract algebra), Index term, Artificial intelligence, 0101 mathematics, Heuristics, business, computer, Selection (genetic algorithm), Natural language processing
Abstract: The Arabic index term selection is a challenging process due to the complex morphological nature of the Arabic language. Index term selection is a significant factor that affects the efficiency of any information retrieval system. Many methods of index term selection were proposed in the literature. The majority of them were based on root extraction and stemming. Other proposals apply complex linguistic rules and machine learning tools. This paper proposes a simple index term selection method using some heuristics such that a representative subset of terms is selected to form the index. The proposed heuristics essentially select index terms from Arabic words having the prefix ‘AL’ (definite words) as a basis. Besides, the proposed method selects new words according to any of the following heuristics: the words preceding or words succeeding definite terms, choosing words that follow some linking words and words following propositions in semi-sentences, and selecting words that represent named entities. The proposed heuristics were tested using the TREC-2001/2002 Arabic test collection. The results show the effectiveness of the proposed method since it outperforms selecting all terms stemmed by two well-known stemmers. For example, choosing definite words and words that represent named entities outperforms selecting all terms stemmed by the LIGHT10 stemmer according to the mean average precision by 8.4% and at the same time decreases the index size by 27.8%.
Published: 2021
Full Text: View/download PDF

24. Activation-Based Cause Analysis Method for Neural Networks

Author: Hyung Il Koo and Yong Gyun Kim
Subjects: Root (linguistics), Focus (computing), Optimization problem, explaining classification, General Computer Science, Artificial neural network, business.industry, Computer science, General Engineering, interpretable machine learning, Machine learning, computer.software_genre, Variety (cybernetics), Task (project management), Visualization, TK1-9971, Range (mathematics), Multilayer perceptron, General Materials Science, Artificial intelligence, Electrical engineering. Electronics. Nuclear engineering, Electrical and Electronic Engineering, business, computer
Abstract: As neural networks are ubiquitous in the real-world environment, we encounter a variety of situations in which neural network-based systems fail. To address these failures, we need to identify their root causes. However, due to the black-box nature of neural networks, it is considered a difficult task to explain/understand internal functions, and cause analysis is usually conducted based on personal experience. To alleviate these problems, we propose a method to compute an element-wise contribution of inputs on current decisions. To this end, we focus on the physical meaning of neuron activations, and formulate an optimization problem that finds partial activations that support current decisions. Then, by accumulating these partial activations in the backward direction, we evaluate the contributions of each element in an input vector to the current decision. Experimental results have shown that the proposed method outperforms conventional methods in terms of cause localization for a range of failure scenarios.
Published: 2021

25. Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation

Author: Achmad Benny Mutiara, Paulus Insap Santosa, Rianto, and Eri Prasetyo Wibowo
Subjects: Root (linguistics), Information Systems and Management, lcsh:Computer engineering. Computer hardware, Computer Networks and Communications, Computer science, Indonesian, media_common.quotation_subject, Text processing, Word error rate, lcsh:TK7885-7895, 02 engineering and technology, computer.software_genre, lcsh:QA75.5-76.95, 030507 speech-language pathology & audiology, 03 medical and health sciences, Stemming, Classifier (linguistics), 0202 electrical engineering, electronic engineering, information engineering, Conversation, Accuracy, media_common, lcsh:T58.5-58.64, business.industry, lcsh:Information technology, Document clustering, Classification, Class (biology), language.human_language, Hardware and Architecture, language, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electronic computers. Computer science, 0305 other medical science, business, computer, Natural language processing, Information Systems
Abstract: Background Stemming has long been used in data pre-processing to retrieve information by tracking affixed words back into their root. In an Indonesian setting, existing stemming methods have been observed, and the existing stemming methods are proven to result in high accuracy level. However, there are not many stemming methods for non-formal Indonesian text processing. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to improve the accuracy of text classifier models by strengthening stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods. Findings The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. These results indicate that the proposed stemming methods produces a classifier model with a small error rate, so it will be more accurate to predict a class of objects. Conclusion The existing Indonesian stemming methods are still oriented towards Indonesian formal sentences, therefore the method has limitations to be used in Indonesian non-formal sentences. This phenomenon underlies the suggestion of developing a corpus by normalizing Indonesian non-formal into formal to be used as a better stemming method. The impact of using the corpus as a stemming method is that it can improve the accuracy of the classifier model. In the future, the proposed corpus and stemming methods can be used for various purposes including text clustering, summarizing, detecting hate speech, and other text processing applications in Indonesian.
Published: 2021

26. A Comprehensive Evaluation of Metadata-Based Features to Classify Research Paper’s Topics

Author: Muhammad Usman, Ghulam Mustafa, Muhammad Afzal, Anis Koubaa, and Abdul Shahid
Subjects: Root (linguistics), Hierarchy, Information retrieval, General Computer Science, Exploit, business.industry, Computer science, Document classification, Deep learning, General Engineering, Decision tree, computer.software_genre, Random forest, Metadata, General Materials Science, Artificial intelligence, Electrical and Electronic Engineering, business, computer
Abstract: The existing plethora of document classification techniques exploits different data sources either from the content or metadata of research articles. Various journal publishers like Springer, Elsevier, IEEE, etc., do not provide open access to the content of research articles, whereas metadata is freely available there. Metadata like title, keyword, and abstract can serve as a better alternative to the content in various scenarios. In the current literature, researchers have assessed the role of some of the metadata individually. We believe that the collective contribution of metadata parameters can play a significant role in classifying research papers. This paper presents a comprehensive evaluation of the role of metadata, individually as well as in combinations to achieve the objective of research paper classification. Moreover, we have classified the research articles into ACM hierarchy root categories (e.g. general literature, hardware, software, etc.). In this comprehensive evaluation, we have assessed all the possible combinations of metadata features against different classifiers such as Random Forest, K Nearest Neighbor, and Decision Tree. The results of this research reveal that the title & keywords combination outperforms other combinations with an F-measure score of 0.88.
Published: 2021
Full Text: View/download PDF

27. A RULE-BASED STEMMER FOR PUNJABI ADJECTIVES

Author: Harmanjeet Kaur and Preetpal Kaur Buttar
Subjects: Root (linguistics), Computer science, business.industry, Rule-based system, computer.software_genre, Set (abstract data type), Noun, Proper noun, Artificial intelligence, business, Adjective, computer, Word (computer architecture), Natural language processing
Abstract: This research work is concerned with the development of a rule-based stemmer for stemming of adjectives in the Punjabi language. Stemming is a method of deriving the root word from the inflected word. The proposed Punjabi Adjective Stemmer (PAS) uses a rule-based approach for converting the inflected Punjabi adjectives to their root forms. A database containing valid root adjectives occurring in the Punjabi language has been created. This database stores 1,762 Punjabi root adjectives. When an adjective word is fed to PAS as an input, first it compares the input word with the root database to determine whether the input adjective is a root adjective or an inflected one. If the input adjective is a root adjective, then no stemming is required and the input adjective is returned as the output. Otherwise, the inflected input adjective is sent to the suffix-stripping algorithm to get the corresponding root adjective. The suffix-stripping algorithm uses a set of predefined rules. India is a linguistically rich country with 22 languages recognized officially. But the computational resources developed for these languages are very scarce. Most of the stemmers developed for Punjabi language so far concentrated on nouns and proper names. PAS is the only stemmer developed so far for specifically addressing the problem of stemming of Punjabi adjectives. PAS has an overall accuracy of 88.76%.
Published: 2020
Full Text: View/download PDF

28. The Impact of Weighting Schemes and Stemming Process on Topic Modeling of Arabic Long and Short Texts

Author: Lejun Zhang, Bockarie Daniel Marah, Raeed Al-Sabri, Najla Al-Nabhan, and Tinghuai Ma
Subjects: Topic model, Root (linguistics), General Computer Science, Computer science, business.industry, Word count, computer.software_genre, Latent Dirichlet allocation, Weighting, Term (time), symbols.namesake, symbols, Artificial intelligence, Cluster analysis, business, tf–idf, computer, Natural language processing
Abstract: In this article, first a comprehensive study of the impact of term weighting schemes on the topic modeling performance (i.e., LDA and DMM) on Arabic long and short texts is presented. We investigate six term weighting methods including Word count method (standard topic models), TFIDF, PMI, BDC, CLPB, and CEW. Moreover, we propose a novel combination term weighting scheme, namely, CmTLB. We utilize the mTFIDF that takes into account the missing terms and the number of the documents in which the term appears when calculating the term weight. For further robust term weight, we combine mTFIDF with two weighting methods. We evaluate CmTLB against the studied weighting schemes by the quality of the learned topics (topic visualization and topic coherence), classification, and clustering tasks. We applied weighting schemes to Latent Dirichlet allocation (LDA) and Dirichlet multinomial mixture (DMM) on eight Arabic long and short document datasets, respectively. The experiment results outline that appropriate weighting schemes can effectively improve topic modeling performance on Arabic texts. More importantly, our proposed CmTLB significantly outperforms the other weighting schemes. Secondly, we investigate whether the Arabic stemming process can improve topic modeling performance. We study the three approaches of Arabic stemming including root-based, stem-based, and statistical approaches. We also train topic models with weighting schemes on documents after applying four stemmers related to different stemming approaches. The results outline that applying the stemming process not only reduces the dimensionality of term-document matrix leading to fast estimation process, but also show enhancement of topic modeling performance both on short and long Arabic documents. Moreover, Farasa stemmer achieves the highest performance in most cases, since it prevents the ambiguity that may happen because of the blind removal of the affixes such as in root-based or stem-based stemmers.
Published: 2020
Full Text: View/download PDF

29. Unsupervised stemmed text corpus for language modeling and transcription of Telugu broadcast news

Author: Laxminarayana Parayitam, Venkataramana Appala, and Mythilisharan Pala
Subjects: Hindi, Text corpus, Linguistics and Language, Root (linguistics), business.industry, Computer science, Supervised learning, computer.software_genre, Language and Linguistics, language.human_language, Telugu, Human-Computer Interaction, ComputingMethodologies_PATTERNRECOGNITION, Transcription (linguistics), language, Computer Vision and Pattern Recognition, Language model, Artificial intelligence, business, computer, Software, Natural language processing, Smoothing
Abstract: In Indian Languages, root words will be either combined or modified to match the context with reference to tense, number and/or gender. So the number of unique words will increase when compared to many European languages. Whatever be the size of the text corpus used for language modeling cannot contain all the possible inflected words. A word which occurred during testing but not in training data is called Out of Vocabulary (OOV) word. Similarly, the text corpus cannot have all possible sequence of words. So Due to this data sparsity, Automatic Speech Recognition system (ASR) may not accommodate all the words in the language model/irrespective of the size of the text corpus. It also becomes computationally challenging if the volume of the data increases exponentially due to morphological changes to the root word. To reduce the OOVs in the language model, a new unsupervised stemming method is proposed in this paper for one Indian language, Telugu, based on the method proposed for Hindi. Other issues in the language modeling for Telugu using techniques like smoothing and interpolation, with supervised and unsupervised stemming data is also analyzed. It is observed that the smoothing techniques Witten–Bell and Kneser–Ney performing well when compared to other techniques, on pre-processed data with supervised learning. The ASRs accuracy is improved by 0.76% and 0.94% with supervised and unsupervised stemming respectively.
Published: 2020
Full Text: View/download PDF

30. Package Software Configuration and Cloud-Based Service System for Building a Smart Factory in the Root Industry

Author: Bo Hyun Kim, Ki Hyeong Song, Hong Jin Jeong, and Dong Yoon Lee
Subjects: Root (linguistics), Service system, Computer science, business.industry, Smart factory, Operating system, Cloud computing, business, computer.software_genre, computer, Software configuration management
Published: 2020
Full Text: View/download PDF

31. A proposed hybrid root cause analysis technique for quality management

Author: Dharyll Prince M. Abellana
Subjects: Structure (mathematical logic), Root (linguistics), Quality management, Computer science, Strategy and Management, media_common.quotation_subject, 05 social sciences, computer.software_genre, General Business, Management and Accounting, 0502 economics and business, Ishikawa diagram, 050211 marketing, Quality (business), Data mining, Root cause analysis, computer, 050203 business & management, media_common
Abstract: PurposeThis paper attempts to develop a hybrid cause and effect diagram (CED) and interpretative structural model (ISM) for root cause analysis in quality management. The proposed model overcomes the weakness of the CED in reliably articulating hierarchical cause–effect Relationships.Design/methodology/approachA focus group discussion (FGD) among quality experts in the case company to establish relationships between the determined causes.FindingsThe hybridization of the CED and ISM allowed the causes to be ordered more clearly to determine potential root causes as well as presenting these causes more comprehensively.Originality/valueThe paper has been one of the very few attempts to improve the CED approach. As such, this paper employs the ability of the ISM to order concepts in a hierarchical structure, which is useful in determining root causes.
Published: 2020
Full Text: View/download PDF

32. LINGUA-COGNITIVE CAPABILITIES OF KAZAKHSTAN MEDIA AS AN INSTRUMENT OF INFLUENCE ON PUBLIC CONSCIOUSNESS

Author: B. Karimova
Subjects: Root (linguistics), business.industry, media_common.quotation_subject, Cognition, Public relations, Modernization theory, Mental health, Lingua franca, Promotion (rank), Rhetoric, Sociology, Consciousness, business, computer, media_common, computer.programming_language
Abstract: The article described some potential opportunities of the Kazakhstan media at the modernization of public consciousness. Internet resources, including social networks, are carried by the Law to media. At impact on the consciousness of people the rhetoric, promotion, language manipulation are considered by the main ways of social networks. The ideas, images, associations, stereotypes which can change completely its attitude to the world or a picture of all society can take root into consciousness of people. It is especially possible to distinguish from lingua-cognitive ways of influence on consciousness of people: repetition of the same phrases; statement and simplification; evfemization of language of media, parceling. Results of the carried-out questioning among users of social networks allow coming to conclusion that it is necessary to find reasonable balance between virtual and real communication to keep mental health.
Published: 2020
Full Text: View/download PDF

33. Developing a Simplified Morphological Analyzer for Arabic Pronominal System

Author: Mohammad Mahyoob
Subjects: Pronoun, Root (linguistics), Class (computer programming), Finite-state machine, Computer science, business.industry, computer.software_genre, Automaton, Closure (mathematics), Artificial intelligence, Representation (mathematics), business, computer, Natural language processing, Word (computer architecture)
Abstract: This paper proposes an improved morphological analyser for Arabic pronominal system using finite state method. The main advantage of the finite state method is very flexible, powerful and efficient. The most important results about FSAs, relates the class of languages generated by finite state automaton to certain closure properties. This result makes the theory of finite-state automata a very versatile and descriptive framework. The main contribution of this work is the full analysis and the representation of morphological analysis of all the inflections of pronoun forms in Arabic. In this paper we build a finite state network for the inflectional forms of the root words, restricted to all the inflections and grammatical properties of generating the dependent and independent forms of pronouns in Arabic language. The results show high score of accuracy in the output with all the needed linguistic features and the evaluation process of output is conducted using f-score test and the achievement is at the rate of 80% to 83%. The results from the study also provide the evidence that Arabic has strong concatenative word formations.
Published: 2020
Full Text: View/download PDF

34. A verificação semântica de especificação de trajetória para verbos de movimento direcionado: os testes de adjunção e de paráfrase

Author: Talita Veridiana Hack Poll and Morgana Fabiola Cambrussi
Subjects: Structure (mathematical logic), Root (linguistics), Class (set theory), Lexical semantics, Computer science, business.industry, Cognitive semantics, Verb, Adjunction, computer.software_genre, Motion (physics), Artificial intelligence, business, computer, Natural language processing
Abstract: Este trabalho discute dois testes de verificação do conteúdo lexicalizado por verbos de movimento do português do Brasil. A classe verbal investigada foi definida a partir da identificação de raiz de movimento, seguida de trajetória especificada ou não especificada. O referencial teórico que sustenta esta pesquisa se alicerça nos estudos de classes verbais produzidos pela teoria lexical, em especial pela semântica lexical, além de estar ancorado em pressupostos da semântica cognitiva, como a estrutura de predicados e os primitivos semânticos (movimento, causação, estado, direção, trajetória e outros). Nosso objetivo é apresentar a paráfrase e a adjunção como recursos linguísticos que podem atuar como teste de verificação para definirmos, entre os verbos de movimento do PB que possuem trajetória como parte de sua estrutura semântica, quais especificam a direção da trajetória e quais não possuem uma trajetória lexicalmente definida. Nossos resultados indicaram que os testes de adjunção e de paráfrase são suficientes para o estabelecimento distintivo das duas subclasses verbais: verbos do tipo de subir (com direção de movimento específica) e verbos do tipo de atravessar (com direção de movimento inespecífica).
Published: 2020
Full Text: View/download PDF

35. Empirical evaluation and study of text stemming algorithms

Author: Manzoor Ilahi Tamimy, Abdul Jabbar, Sajid Iqbal, Adnan Akhunzada, and Shafiq Hussain
Subjects: Linguistics and Language, Root (linguistics), Correctness, Computer science, business.industry, 02 engineering and technology, Conflation, computer.software_genre, Language and Linguistics, Resource (project management), Artificial Intelligence, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Preprocessor, 020201 artificial intelligence & image processing, Performance measurement, Artificial intelligence, business, computer, Arabic script, Natural language processing, Word (computer architecture)
Abstract: Text stemming is one of the basic preprocessing step for Natural Language Processing applications which is used to transform different word forms into a standard root form. For Arabic script based languages, adequate analysis of text by stemmers is a challenging task due to large number of ambigious structures of the language. In literature, multiple performance evaluation metrics exist for stemmers, each describing the performance from particular aspect. In this work, we review and analyze the text stemming evaluation methods in order to devise criteria for better measurement of stemmer performance. Role of different aspects of stemmer performance measurement like main features, merits and shortcomings are discussed using a resource scarce language i.e. Urdu. Through our experiments we conclude that the current evaluation metrics can only measure an average conflation of words regardless of the correctness of the stem. Moreover, some evaluation metrics favor some type of languages only. None of the existing evaluation metrics can perfectly measure the stemmer performance for all kind of languages. This study will help researchers to evaluate their stemmer using right methods.
Published: 2020
Full Text: View/download PDF

36. Improving Question Generation with Sentence-Level Semantic Matching and Answer Position Inferring

Author: Xiyao Ma, Qile Zhu, Yanlin Zhou, and Xiaolin Li
Subjects: Interrogative word, Root (linguistics), Computer science, business.industry, 020206 networking & telecommunications, Context (language use), 02 engineering and technology, General Medicine, computer.software_genre, Semantics, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing, Sentence, Semantic matching
Abstract: Taking an answer and its context as input, sequence-to-sequence models have made considerable progress on question generation. However, we observe that these approaches often generate wrong question words or keywords and copy answer-irrelevant words from the input. We believe that lacking global question semantics and exploiting answer position-awareness not well are the key root causes. In this paper, we propose a neural question generation model with two general modules: sentence-level semantic matching and answer position inferring. Further, we enhance the initial state of the decoder by leveraging the answer-aware gated fusion mechanism. Experimental results demonstrate that our model outperforms the state-of-the-art (SOTA) models on SQuAD and MARCO datasets. Owing to its generality, our work also improves the existing models significantly.
Published: 2020
Full Text: View/download PDF

37. Diagnosing root causes of intermittent slow queries in cloud databases

Author: Dan Pei, Nengjun Qiu, Changcheng Chen, Minghua Ma, Yilin Li, Hanwen Hu, Christopher Zheng, Shenglin Zhang, Zheng Yin, Sheng Wang, Cheng Luo, Feifei Li, and Xinhao Jiang
Subjects: Root (linguistics), Database, business.industry, Computer science, General Engineering, Stability (learning theory), 020206 networking & telecommunications, Cloud computing, 02 engineering and technology, computer.software_genre, Set (abstract data type), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Online transaction processing, Cluster analysis, business, computer, Interpretability
Abstract: With the growing market of cloud databases, careful detection and elimination of slow queries are of great importance to service stability. Previous studies focus on optimizing the slow queries that result from internal reasons (e.g., poorly-written SQLs). In this work, we discover a different set of slow queries which might be more hazardous to database users than other slow queries. We name such queries Intermittent Slow Queries (iSQs), because they usually result from intermittent performance issues that are external (e.g., at database or machine levels). Diagnosing root causes of iSQs is a tough but very valuable task. This paper presents iSQUAD, Intermittent Slow QUery Anomaly Diagnoser, a framework that can diagnose the root causes of iSQs with a loose requirement for human intervention. Due to the complexity of this issue, a machine learning approach comes to light naturally to draw the interconnection between iSQs and root causes, but it faces challenges in terms of versatility, labeling overhead and interpretability. To tackle these challenges, we design four components, i.e., Anomaly Extraction, Dependency Cleansing, Type-Oriented Pattern Integration Clustering (TOPIC) and Bayesian Case Model. iSQUAD consists of an offline clustering & explanation stage and an online root cause diagnosis & update stage. DBAs need to label each iSQ cluster only once at the offline stage unless a new type of iSQs emerges at the online stage. Our evaluations on real-world datasets from Alibaba OLTP Database show that iSQUAD achieves an iSQ root cause diagnosis average F1-score of 80.4%, and outperforms existing diagnostic tools in terms of accuracy and efficiency.
Published: 2020
Full Text: View/download PDF

38. Evaluating Synthetic Bugs

Author: Tim Leek, Brendan Dolan-Gavitt, Joshua Bundt, Andrew Fasano, and William Robertson
Subjects: FOS: Computer and information sciences, Root (linguistics), Computer Science - Cryptography and Security, business.industry, Computer science, CPU time, 020207 software engineering, 02 engineering and technology, Fuzz testing, Symbolic execution, Machine learning, computer.software_genre, TheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, business, computer, Cryptography and Security (cs.CR)
Abstract: Fuzz testing has been used to find bugs in programs since the 1990s, but despite decades of dedicated research, there is still no consensus on which fuzzing techniques work best. One reason for this is the paucity of ground truth: bugs in real programs with known root causes and triggering inputs are difficult to collect at a meaningful scale. Bug injection technologies that add synthetic bugs into real programs seem to offer a solution, but the differences in finding these synthetic bugs versus organic bugs have not previously been explored at a large scale. Using over 80 years of CPU time, we ran eight fuzzers across 20 targets from the Rode0day bug-finding competition and the LAVA-M corpus. Experiments were standardized with respect to compute resources and metrics gathered. These experiments show differences in fuzzer performance as well as the impact of various configuration options. For instance, it is clear that integrating symbolic execution with mutational fuzzing is very effective and that using dictionaries improves performance. Other conclusions are less clear-cut; for example, no one fuzzer beat all others on all tests. It is noteworthy that no fuzzer found any organic bugs (i.e., one reported in a CVE), despite 50 such bugs being available for discovery in the fuzzing corpus. A close analysis of results revealed a possible explanation: a dramatic difference between where synthetic and organic bugs live with respect to the ''main path'' discovered by fuzzers. We find that recent updates to bug injection systems have made synthetic bugs more difficult to discover, but they are still significantly easier to find than organic bugs in our target programs. Finally, this study identifies flaws in bug injection techniques and suggests a number of axes along which synthetic bugs should be improved., Comment: 15 pages
Published: 2022
Full Text: View/download PDF

39. Egyptian Root Lexicon

Author: Helmut Satzinger and Danijela Stefanovic
Subjects: Root (linguistics), business.industry, Artificial intelligence, computer.software_genre, business, Lexicon, computer, Natural language processing, Mathematics
Abstract: The Egyptian Root Lexicon presents the envisaged roots of the Egyptian words, hypothetically established on the basis of attested lexemes on obvious phonetic and semantic resemblance. As the etymological research in the field of Afro-Asiatic is not sufficiently advanced, the lexical roots are not set up on an etymological basis. The main part of the book contains the roots (numerically marked with DRID identifier) in alphabetic arrangement, with their subsequent lexemes marked with an identity number, the “ID,” as created by the Thesaurus Linguae Aegyptiae (TLA), of the Berlin Academy of Sciences. The roots section is followed by extensive indexes, including a lexeme index and an index of roots of Semitic origin. A selected bibliography concludes the work.
Published: 2021
Full Text: View/download PDF

40. Static and Dynamic Human Activity Detection Using Multi CNN-ELM Approach

Author: M. N. Thippeswamy and Shilpa Ankalaki
Subjects: Root (linguistics), Walking upstairs, Ubiquitous computing, Computer science, business.industry, Deep learning, Machine learning, computer.software_genre, Field (computer science), Activity recognition, Categorization, Artificial intelligence, Human activity detection, business, computer
Abstract: Human Activity Recognition (HAR) is leading-edge in today's research field which has its applications in multiple research areas, some of those are Smart Health, Security and Ambient Assisted Living, etc. In today’s ubiquitous computing, HAR can be accomplished by espousing deep learning techniques that replace traditional analytical techniques that depend on the extraction of handcrafted features and classification methods. This work employed the Hierarchical Multi Convolution—Extreme Learning Machine approach for the classification of human activities. In the Hierarchical Multi CNN approach, the root CNN is employed to categorize the activities into static and dynamic activities. In the next level, two CNN-ELM are used to classify static activities into laying down, stand and sit; and classifies dynamic activities into Walking, Walking Downstairs, and walking upstairs. CNN-ELM approach exhibits its major advantages: CNN extracts the features from the dataset which confiscates expert knowledge in extracting features and ELM classifies the transitional results. This framework is evaluated on the UCI-HAR dataset and achieves an accuracy of 96.86%.
Published: 2021
Full Text: View/download PDF

41. 12 Angry Developers - A Qualitative Study on Developers' Struggles with CSP

Author: Michael Backes, Sebastian Roth, Lea Theresa Gröber, Ben Stock, and Katharina Krombholz
Subjects: Root (linguistics), Computer science, business.industry, Cross-site scripting, Cornerstone, Content Security Policy, computer.software_genre, Internet security, Task (project management), World Wide Web, Software deployment, Scripting language, business, computer
Abstract: The Web has improved our ways of communicating, collaborating, teaching, and entertaining us and our fellow human beings. However, this cornerstone of our modern society is also one of the main targets of attacks, most prominently Cross-Site Scripting (XSS). A correctly crafted Content Security Policy (CSP) is capable of effectively mitigating the effect of those Cross-Site Scripting attacks. However, research has shown that the vast majority of all policies in the wild are trivially bypassable. To uncover the root causes behind the omnipresent misconfiguration of CSP, we conducted a qualitative study involving 12 real-world Web developers. By combining a semi-structured interview, a drawing task, and a programming task, we were able to identify the participant's misconceptions regarding the attacker model covered by CSP as well as roadblocks for secure deployment or strategies used to create a CSP.
Published: 2021
Full Text: View/download PDF

42. IoTLS

Author: Narseo Vallina-Rodriguez, Muhammad Talha Paracha, Daniel J. Dubois, and David Choffnes
Subjects: IoT, 0303 health sciences, Root (linguistics), Transport Layer Security, Computer science, Network security, business.industry, 020206 networking & telecommunications, 02 engineering and technology, Transparency (human–computer interaction), Transparency, Computer security, computer.software_genre, Certificate, 03 medical and health sciences, Upgrade, TLS, 0202 electrical engineering, electronic engineering, information engineering, business, computer, Implementation, Protocol (object-oriented programming), 030304 developmental biology
Abstract: Consumer IoT devices are becoming increasingly popular, with most leveraging TLS to provide connection security. In this work, we study a large number of TLS-enabled consumer IoT devices to shed light on how effectively they use TLS, in terms of establishing secure connections and correctly validating certificates, and how observed behavior changes over time. To this end, we gather more than two years of TLS network traffic from IoT devices, conduct active probing to test for vulnerabilities, and develop a novel black- box technique for exploring the trusted root stores in IoT devices by exploiting a side-channel through TLS Alert Messages. We find a wide range of behaviors across devices, with some adopting best security practices but most being vulnerable in one or more of the following ways: use of old/insecure protocol versions and/or ciphersuites, lack of certificate validation, and poor maintenance of root stores. Specifically, we find that at least 8 IoT devices still include distrusted certificates in their root stores, 11/32 devices are vulnerable to TLS interception attacks, and that many devices fail to adopt modern protocol features over time. Our findings motivate the need for IoT manufacturers to audit, upgrade, and maintain their devices’ TLS implementations in a consistent and uniform way that safeguards all of their network traffic. USA NSF EU H2020 Spanish Ministry of Science Consumer Reports TRUE inpress
Published: 2021
Full Text: View/download PDF

43. Tracing your roots

Author: Joshua Mason, Zakir Durumeric, Zane Ma, Michael Bailey, and James Austgen
Subjects: Structure (mathematical logic), Root (linguistics), User agent, Computer science, Software deployment, Trust anchor, Transparency (human–computer interaction), Tracing, Android (operating system), Computer security, computer.software_genre, computer
Abstract: Secure TLS server authentication depends on reliable trust anchors. The fault intolerant design of today's system---where a single compromised trust anchor can impersonate nearly all web entities---necessitates the careful assessment of each trust anchor found in a root store. In this work, we present a first look at the root store ecosystem that underlies the accelerating deployment of TLS. Our broad collection of TLS user agents, libraries, and operating systems reveals a surprisingly condensed root store ecosystem, with nearly all user agents ultimately deriving their roots from one of three root programs: Apple, Microsoft, and NSS. This inverted pyramid structure further magnifies the importance of judicious root store management by these foundational root programs. Our analysis of root store management presents evidence of NSS's relative operational agility, transparency, and rigorous inclusion policies. Unsurprisingly, all derivative root stores in our dataset (e.g., Linuxes, Android, NodeJS) draw their roots from NSS. Despite this solid footing, derivative root stores display lax update routines and often customize their root stores in questionable ways. By scrutinizing these practices, we highlight two fundamental obstacles to existing NSS-derived root stores: rigid on-or-off trust and multi-purpose root stores. Taken together, our study highlights the concentration of root store trust in TLS server authentication, exposes questionable root management practices, and proposes improvements for future TLS root stores.
Published: 2021
Full Text: View/download PDF

44. Heritage Language Research and Theoretical Linguistics

Author: Elabbas Benmamoun
Subjects: Root (linguistics), Heritage language, Movement (music), Theoretical linguistics, Phrase structure rules, Sociology, Minimalist program, computer.software_genre, computer, Linguistics, Word order, Merge (linguistics)
Published: 2021
Full Text: View/download PDF

45. How to use Machine Learning to improve the discrimination between signal and background at particle colliders

Author: Lorena Dieste Maronas, Xabier Cid Vidal, and Alvaro Dosil Suárez
Subjects: Technology, Root (linguistics), QH301-705.5, Computer science, QC1-999, FOS: Physical sciences, Machine learning, computer.software_genre, Field (computer science), High Energy Physics - Experiment, Reduction (complexity), High Energy Physics - Experiment (hep-ex), Resource (project management), General Materials Science, Biology (General), QD1-999, Instrumentation, Fluid Flow and Transfer Processes, high-energy physics, Large Hadron Collider, Artificial neural network, business.industry, Physics, Process Chemistry and Technology, SIGNAL (programming language), General Engineering, Engineering (General). Civil engineering (General), Computer Science Applications, LHCb, Chemistry, machine learning, Alternating decision tree, LHC, Artificial intelligence, TA1-2040, business, computer
Abstract: The popularity of Machine Learning (ML) has been increasing in the last decades in almost every area, being the commercial and scientific fields the most notorious ones. Concerning particle physics, ML has been proved as a useful resource to make the most of projects such as the Large Hadron Collider (LHC). The main advantage provided by ML is reducing the time and effort put into the measurements done by experiments, while improving the performance. With this work we aim to encourage scientists at particle colliders to use ML and to try the different alternatives we have available nowadays, focusing in the separation between signal and background. We assess some of the most used libraries in the field, like Toolkit for Multivariate Data Analysis with ROOT, and also newer and more sophisticated options like PyTorch and Keras. We also check how optimal are some of the most common algorithms for signal-background discrimination, such as Boosted Decision Trees, and propose the use of others, namely Neural Networks. We compare the overall performance of different algorithms and libraries in simulated LHC data and produce some guidelines to help analysts deal with different situations. Examples are the use of low or high-level features from particle detectors or the amount of statistics available for training the algorithms., 26 pages, 7 figures
Published: 2021

46. Packages and Import

Author: Mikael Olsson
Subjects: Root (linguistics), Java, Computer science, Programming language, Data_FILES, Code (cryptography), Directory, computer.software_genre, GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries), Package.name, computer, computer.programming_language
Abstract: Packages are used to avoid naming conflicts and to organize code files into different directories. So far in this book the code file has been located at the root of the project's source directory. Therefore, it has belonged to the so called default package. In Java, the directory a file belongs to, relative to the project's source directory, corresponds to the package name.
Published: 2021
Full Text: View/download PDF

47. A big data-driven root cause analysis system:Application of Machine Learning in quality problem solving

Author: Qiuping Ma, Hongyan Li, and Anders Thorstenson
Subjects: Root (linguistics), General Computer Science, Computer science, business.industry, media_common.quotation_subject, Supply chain, Big data, General Engineering, Neural Network, Root cause, Machine learning, computer.software_genre, Machine Learning, Identification (information), Data quality, Multi-class classification, Quality (business), Artificial intelligence, Quality management, Root cause analysis, business, computer, Data mining, media_common
Abstract: Root cause analysis for quality problem solving is critical to improve product quality performance and reduce the quality risk for manufacturers. Subjective conventional methods have been applied frequently in past decades. However, due to increasingly complex product and supply chain structures, diverse working conditions, and massive amounts of components, accuracy and efficiency of root cause analysis are progressively challenged in practice. Therefore, data-driven root cause analysis methods have attracted attention lately. In this paper, taking advantage of the availability of big operations data and the rapid development of data science, we design a big data-driven root cause analysis system utilizing Machine Learning techniques to improve the performance of root cause analysis. More specifically, we first propose a conceptual framework of the big data-driven root cause analysis system including three modules of Problem Identification, Root Cause Identification, and Permanent Corrective Action. Furthermore, in the Problem Identification Module, we construct a unified feature-based approach to describe multiple and different types of quality problems by applying a data mining method. In the Root Cause Identification Module, we use supervised Machine Learning (classification) methods to automatically predict the root causes of multiple quality problems. Finally, we illustrate the accuracy and efficiency of the proposed system and algorithms based on actual quality data from a case company. This study contributes to the literature from the following aspects: (i) the integrated system and algorithms can be used directly to develop a computer application to manage and solve quality problems with high concurrences and complexities in any manufacturing process; (ii) a general procedure and method are provided to formulate and describe a large quantity and different types of quality problems; (iii) compared with traditional methods, it is demonstrated using real case data that manufacturing companies can save significant time and cost with our proposed data-driven root cause analysis system; (iv) this study not only aims at improving the quality problem solving practices for a complex manufacturing process but also bridges a gap between the theoretical development of Machining Learning methods and their application in the operations management domain.
Published: 2021
Full Text: View/download PDF

48. A Taxonomy of Defenses against Memory Corruption Attacks

Author: Bojan Novkovic
Subjects: Root (linguistics), memory corruption, systems security, vulnerabilities, System programming, Memory management, Categorization, Software bug, Computer science, Taxonomy (general), Memory corruption, Computer security, computer.software_genre, Memory safety, computer
Abstract: Vulnerabilities caused by memory corruption related bugs are a pervasive threat, continually undermining the security of the whole computing environment. The lack of memory safety mechanisms in indispensable systems programming languages like C or C++ leaves plenty of room for programmer-induced errors which often result in catastrophic security breaches. We analyze the root causes of these vulnerabilities, providing a concise overview of memory corruption bugs and related attacks. Although present in a number of programming languages, we use the C/C++ programming language as a basis for our analyses. We categorize existing defensive mechanisms based on the attack techniques they focused on preventing and give a brief insight into state- of-the-art defenses introduced throughout the years, with a special focus on operating system defenses.
Published: 2021
Full Text: View/download PDF

49. Spatial and Texture Analysis of Root System Architecture with Earth Mover's Distance (STARSEED)

Author: Joshua Peeples, Weihuang Xu, Alina Zare, Romain M. Gloaguen, Diane Rowland, and Zachary T. Brym
Subjects: Root (linguistics), Computer science, Root system architecture, Code (cryptography), Data mining, Root system, Texture (music), Architecture, computer.software_genre, computer, Spatial analysis, Earth mover's distance
Abstract: Root system architectures are complex, multidimensional, and challenging to characterize effectively for agronomic and ecological discovery. We propose a new method, Spatial and Texture Analysis of Root System architEcture with Earth mover's Distance (STARSEED), for comparing root architectures that incorporate spatial information through a novel application of the Earth Mover's Distance (EMD).We illustrate that the approach captures the response of sesame root systems for different genotypes and soil moisture levels. STARSEED provides quantitative and visual insights into changes that occur in root architectures across experimental treatments.STARSEED can be easily generalized to other plants and provides insight into root system architecture development and response to varying growth conditions not captured by existing root architecture metrics and models. The code and data for our experiments are publicly available: https://github.com/GatorSense/STARSEED.
Published: 2021
Full Text: View/download PDF

50. Mono- and Cross-Language Information Retrieval based on Analogical Proportions: A Review

Author: Myriam Bounhas and Bilel Elayeb
Subjects: Root (linguistics), Similarity (network science), business.industry, Extrapolation, Analogy, Of the form, Artificial intelligence, computer.software_genre, business, computer, Cross-language information retrieval, Natural language processing, Mathematics
Abstract: Since more than a decade, many researchers have been focused on analogical proportions and their applications in different domains. Analogical proportions are known as statements of the form “x is to y as z is to t” that we denote “x : y : : z : t”. This relationship expresses that “x differs from y as z differs from t”, as well as “y differs from x as t differs from z”. Making the assumption that items x, y, z, and t form a valid analogical proportion, and given that item t is unknown, this is the basic root for extrapolation enabling to guess the fourth item t based on the values of the triple (x, y, z). This extrapolation principle confirms that analogy may be seen as much a matter of similarity as a matter of dissimilarity. This paper introduces the reader to the main motivation for exploiting analogical proportions in mono- and cross-language information retrieval (IR/CLIR) tasks. The assessment of the existing analogical IR/CLIR approaches is also reviewed. Finally, some challenges and future trends are suggested.
Published: 2021
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

1,361 results on '"Root (linguistics)"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources