Descriptor: "Word Embeddings" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Word Embeddings"' showing total 2,386 results

Start Over Descriptor "Word Embeddings"

2,386 results on '"Word Embeddings"'

1. Enhancing Generative AI Chatbot Accuracy Using Knowledge Graph

Author: Bandi, Ajay, Babu, Jameer, Zeng, Ruida, Muthyala, Sai Ram, Ghosh, Ashish, Editorial Board Member, Feng, Wenying, editor, Rahimi, Nick, editor, and Margapuri, Venkatasivakumar, editor
Published: 2025
Full Text: View/download PDF

2. EventBoost: Enhancement of Twitter Event Detection Using Social Features and Word Embeddings

Author: Pradhan, Abhaya Kumar, Mohanty, Hrushikesha, Lal, Rajendra Prasad, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Kumar, Amit, editor, Gunjan, Vinit Kumar, editor, Senatore, Sabrina, editor, and Hu, Yu-Chen, editor
Published: 2025
Full Text: View/download PDF

3. Machine Learning Approaches to Emotion Detection

Author: Cavicchio, Federica, Hirst, Graeme, Series Editor, and Cavicchio, Federica
Published: 2025
Full Text: View/download PDF

4. Efficient Classification of Hallmark of Cancer Using Embedding-Based Support Vector Machine for Multilabel Text.

Author: Verma, Shikha, Sharan, Aditi, and Malik, Nidhi
Subjects: *SUPPORT vector machines, *TUMOR classification, *CARCINOGENESIS, *CANCER invasiveness, *RESEARCH personnel, *MACHINE learning
Abstract: The Hallmark of Cancers consists of various biological capabilities of the tumor cell which help the medical experts to understand the development and identification of these cells during various stages of the cancer disease. The hallmark of cancer classification is a widely accepted framework that characterizes the fundamental biological capabilities of cancer cells. This classification is based on the work of Hanahan and Weinberg, who identified 10 hallmark capabilities that collectively enable the development and progression of cancer. The hallmark of cancer classification provides a comprehensive framework for understanding the biological basis of cancer development and progression. It helps researchers to identify the key molecular and cellular pathways that are involved in the disease, which can inform the development of new diagnostic tools and therapies. Multi-label classification aims to assign a set of labels to the samples under study. This paper focuses on creating an improved model by hybridizing the biomedical domain-specific embeddings for all the extracted biomedical features on the machine learning model. The use of domain-specific embeddings adds semantics to the vector-represented text. More specifically the study has tried to improve the efficacy of the multi-label classification as compared with other state-of-art methods using BioWordVec and the MeSH embeddings. The experimental work showed a significant improvement in the performance of our model which is being trained on the machine learning algorithm Support Vector Machine (SVM). The paper also focuses on understanding the label correlation which is studied by conducting a case study with medical domain experts and is also analyzed with the proposed model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Theoretical Foundations and Limits of Word Embeddings: What Types of Meaning can They Capture?

Author: Arseniev-Koehler, Alina
Subjects: *STRUCTURAL linguistics, *CULTURE, *SEMANTICS, *SEMIOTICS, *STRUCTURALISM
Abstract: Measuring meaning is a central problem in cultural sociology and word embeddings may offer powerful new tools to do so. But like any tool, they build on and exert theoretical assumptions. In this paper, I theorize the ways in which word embeddings model three core premises of a structural linguistic theory of meaning: that meaning is coherent, relational, and may be analyzed as a static system. In certain ways, word embeddings are vulnerable to the enduring critiques of these premises. In other ways, word embeddings offer novel solutions to these critiques. More broadly, formalizing the study of meaning with word embeddings offers theoretical opportunities to clarify core concepts and debates in cultural sociology, such as the coherence of meaning. Just as network analysis specified the once vague notion of social relations, formalizing meaning with embeddings can push us to specify and reimagine meaning itself. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Deep learning and embeddings-based approaches for keyphrase extraction: a literature review.

Author: Giarelis, Nikolaos and Karacapilidis, Nikos
Subjects: NATURAL language processing, COMPUTATIONAL linguistics, LITERATURE reviews, COMPARATIVE method, POPULAR literature, DEEP learning
Abstract: Keyphrase extraction is a subtask of natural language processing referring to the automatic extraction of salient terms that semantically capture the key themes and topics of a document. Earlier literature reviews focus on classical approaches that employ various statistical or graph-based techniques; these approaches miss important keywords/keyphrases, due to their inability to fully utilize context (that is present or not) in a document, thus achieving low F1 scores. Recent advances in deep learning and word/sentence embedding vectors lead to the development of new approaches, which address the lack of context and outperform the majority of classical ones. Taking the above into account, the contribution of this review is fourfold: (i) we analyze the state-of-the-art keyphrase extraction approaches and categorize them upon their employed techniques; (ii) we provide a comparative evaluation of these approaches, using well-known datasets of the literature and popular evaluation metrics, such as the F1 score; (iii) we provide a series of insights on various keyphrase extraction issues, including alternative approaches and future research directions; (iv) we make the datasets and code used in our experiments public, aiming to further increase the reproducibility of this work and facilitate future research in the field. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Emotion topology: extracting fundamental components of emotions from text using word embeddings.

Author: Plisiecki, Hubert and Sobieszek, Adam
Subjects: NATURAL language processing, SOCIAL media, DECOMPOSITION method, EMOTIONS, SELF-expression
Abstract: This exploratory study examined the potential of word embeddings, an automated numerical representation of written text, as a novel method for emotion decomposition analysis. Drawing from a substantial dataset scraped from a Social Media site, we constructed emotion vectors to extract the dimensions of emotions, as annotated by the readers of the texts, directly from human language. Our findings demonstrated that word embeddings yield emotional components akin to those found in previous literature, offering an alternative perspective not bounded by theoretical presuppositions, as well as showing that the dimensional structure of emotions is reflected in the semantic structure of their text-based expressions. Our study highlights word embeddings as a promising tool for uncovering the nuances of human emotions and comments on the potential of this approach for other psychological domains, providing a basis for future studies. The exploratory nature of this research paves the way for further development and refinement of this method, promising to enrich our understanding of emotional constructs and psychological phenomena in a more ecologically valid and data-driven manner. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Analyzing World Cup Impact Through an Evolutionary Optimization Approach Based on Sentiment Polarity with Pre-trained Word Embeddings.

Author: Obiedat, Ruba, Suleiman, Dima, M. Al-Zoubi, Ala', Al-Zain, Yazan, and Harfoushi, Osama
Abstract: The 2022 Qatar World Cup created massive global attention and generated widespread discussions on different social media platforms, including the X platform. The event was the subject of intense debate after Qatar was announced as the host. Opinions were divided, with supporters and critics weighing in based on political, ethical, cultural, and social considerations. Public sentiments evolved throughout three key phases-before, during, and after the event-shaped by numerous factors. This study aims to analyze these sentiments during these three stages based on a novel hybrid evolutionary approach. Three versions for each stage were produced by applying pre-trained word embeddings with 100 and 400 features and sentiment features combined with word embeddings. In total, nine different versions of datasets were employed to examine the proposed approach. Furthermore, five different metaheuristic algorithms were applied: the multi-verse optimizer (MVO), the genetic algorithm (GA), the particle swarm optimization (PSO), the salp swarm algorithm (SSA), and the whale optimization algorithm (WOA). The five metaheuristic algorithms were combined with the feature selection-support vector machine (FS-SVM) and weighting-support vector machine (WSVM) to examine the newly created dataset versions. The results reveal that people's perspectives shifted from negative before the event to positive during and after the event. Moreover, a comparison of the proposed MVO-WSVM and MVO-SVM-Fs approaches with other metaheuristic algorithms showed the superior accuracy of the proposed approaches in sentiment prediction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. F-DenseCNN: feature-based dense convolutional neural networks and swift text word embeddings for enhanced hate speech prediction.

Author: Shilpashree, S. and Ashoka, D. V.
Abstract: Hate speech on social media platforms poses a significant threat to individuals and society, necessitating robust automated detection systems. While existing approaches employ supervised machine learning with text mining elements, they often fall short in capturing the nuanced and evolving nature of hate speech, including subtle linguistic cues, implicit biases, and coded language. This study addresses these limitations by introducing two novel techniques: the feature-based dense convolutional neural network and the swift text word embedding technique. Our key contributions include the development of F-DenseCNN, a deep learning architecture designed to extract complex features from textual data, and the introduction of the swift text word embedding technique, offering efficient and context-aware word representations. Extensive experimentation and evaluation demonstrate that our proposed method significantly outperforms conventional approaches, achieving a 96.2% accuracy in hate speech detection. This substantial improvement in detection accuracy has important implications for content moderation systems, potentially enhancing their reliability and effectiveness in combating online hate speech. Our findings underscore the potential of advanced deep learning techniques in addressing the evolving challenges of hate speech detection on social media platforms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Identification of paraphrased text in research articles through improved embeddings and fine-tuned BERT model.

Author: Razaq, Abdur, Halim, Zahid, Ur Rahman, Atta, and Sikandar, Kholla
Subjects: LANGUAGE models, ARTIFICIAL intelligence, TRUST, BLOGS, PARAPHRASE
Abstract: With the emerging new technologies based on Artificial Intelligence (AI) for the generation of new and paraphrasing of existing text, the identification of genuinely written text has become an important research undertaking. Past approaches to address this issue, need a significant volume of human-labeled data. Most of the approaches used in literature are either for noisy text or for clean text. Conversations in chats, text in blogs, text messages on cell phones, text exchange on Messengers, etc., are examples of noisy text that may contain misspelled words or incomplete words. The second approach focuses on clean text, which is free from the mentioned characteristics in the noisy text. As research articles do not contain noisy data, we propose a model that focuses on clean text for the identification of paraphrases in research articles. To address the problem of paraphrase detection, this work presents a novel Bidirectional Encoder Representation from Transformers (BERT) based model with fine-tuning. For word representation, Global Vectors (Glove) embeddings and contextualized Embeddings From Language Models (ELMo) are employed in this work. Initially, the model is evaluated without performing preprocessing. Later, the preprocessing step is performed before evaluating the model. Extensive experimentations are performed to evaluate the proposed model utilizing two benchmark datasets, namely, Microsoft Research Paraphrase (MSRP) and Quora Question Pairs (Quora). A comparison of the proposed model is done with four closely related state-of-the-art works. The obtained results show that Fine-tuned BERT using ELMo embeddings with preprocessing produces promising outcomes. Paraphrase identification rates achieved on MSRP and Quora datasets are 86.51% and 94.32%, respectively, which are better than the other competing methods. The proposed solution enables the identification of paraphrased text with a higher accuracy having its application in multiple domains requiring genuinely written documents. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. A workflow for analyzing cultural schemas in texts.

Author: Taylor, Marshall A. and Stoltz, Dustin S.
Subjects: *CULTURE, *GROUP identity, *WORKFLOW, *BLOGS, *COLLECTIONS
Abstract: Concept class analysis (CoCA) is a method for recovering cultural schemas in texts using a combination of word embedding and community detection models. Like survey-based forms of schematic class analysis (SCA), however, interpreting results can be difficult. Some of these interpretive difficulties are applicable across types of SCA, while others are unique to CoCA. In this paper, we propose a complete workflow for interpreting and analyzing CoCA output. We use the case of social identity schemas in a collection of over 13,000 U.S. political blog posts to outline a number of interpretive and analytical strategies and a robustness check to make sense of the cultural schemas recovered from texts. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Training and evaluation of vector models for Galician.

Author: Garcia, Marcos
Abstract: This paper presents a large and systematic assessment of distributional models for Galician. To this end, we have first trained and evaluated static word embeddings (e.g., word2vec, GloVe), and then compared their performance with that of current contextualised representations generated by neural language models. First, we have compiled and processed a large corpus for Galician, and created four datasets for word analogies and concept categorisation based on standard resources for other languages. Using the aforementioned corpus, we have trained 760 static vector space models which vary in their input representations (e.g., adjacency-based versus dependency-based approaches), learning algorithms, size of the surrounding contexts, and in the number of vector dimensions. These models have been evaluated both intrinsically, using the newly created datasets, and on extrinsic tasks, namely on POS-tagging, dependency parsing, and named entity recognition. The results provide new insights into the performance of different vector models in Galician, and about the impact of several training parameters on each task. In general, fastText embeddings are the static representations with the best performance in the intrinsic evaluations and in named entity recognition, while syntax-based embeddings achieve the highest results in POS-tagging and dependency parsing, indicating that there is no significant correlation between the performance in the intrinsic and extrinsic tasks. Finally, we have compared the performance of static vector representations with that of BERT-based word embeddings, whose fine-tuning obtains the best performance on named entity recognition. This comparison provides a comprehensive state-of-the-art of current models in Galician, and releases new transformer-based models for NER. All the resources used in this research are freely available to the community, and the best models have been incorporated into SemantiGal, an online tool to explore vector representations for Galician. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Affective, semantic, frequency, and descriptive norms for 107 face emojis.

Author: Scheffler, Tatjana and Nenchev, Ivan
Subjects: *SOCIAL norms, *EMOTICONS & emojis, *DATABASES, *ACQUISITION of data, *AFFECT (Psychology)
Abstract: We introduce a novel dataset of affective, semantic, and descriptive norms for all facial emojis at the point of data collection. We gathered and examined subjective ratings of emojis from 138 German speakers along five essential dimensions: valence, arousal, familiarity, clarity, and visual complexity. Additionally, we provide absolute frequency counts of emoji use, drawn from an extensive Twitter corpus, as well as a much smaller WhatsApp database. Our results replicate the well-established quadratic relationship between arousal and valence of lexical items, also known for words. We also report associations among the variables: for example, the subjective familiarity of an emoji is strongly correlated with its usage frequency, and positively associated with its emotional valence and clarity of meaning. We establish the meanings associated with face emojis, by asking participants for up to three descriptions for each emoji. Using this linguistic data, we computed vector embeddings for each emoji, enabling an exploration of their distribution within the semantic space. Our description-based emoji vector embeddings not only capture typical meaning components of emojis, such as their valence, but also surpass simple definitions and direct emoji2vec models in reflecting the semantic relationship between emojis and words. Our dataset stands out due to its robust reliability and validity. This new semantic norm for face emojis impacts the future design of highly controlled experiments focused on the cognitive processing of emojis, their lexical representation, and their linguistic properties. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Improving fake job description detection using deep learning-based NLP techniques

Author: Dinh-Hong Vu, Kien Nguyen, Khai Thien Tran, Bay Vo, and Tuong Le
Subjects: Fake job description detection, natural language processing, word embeddings, deep learning, Telecommunication, TK5101-6720, Information technology, T58.5-58.64
Abstract: In the modern online society, where social networks are increasingly developed, the authenticity of information is one of the essential needs. There are many problems with detecting fake information, such as fake news detection, fake review detection, etc. Fake job description (FJD) detection is an interesting problem many groups have studied recently. However, current studies still need improvement in predictability. Therefore, this study develops a new method named NLP2FJD that utilizes deep learning-based NLP techniques for improving FJD detection. Firstly, we utilize the pre-trained Word2Vec to extract features from textual information from the dataset. Next, combining textual information and meta-information in the experimental dataset to improve the performance of FJD detection. The above two improvements will help the recommender system significantly improve the predictive ability of the proposed model. Finally, the empirical experiments are conducted to confirm the effectiveness of the proposed method on the experimental dataset compared with cutting-edge methods. The experimental results demonstrate that the NLP2FJD framework transcends other experimental methods for FJD detection on the experimental dataset. Besides, this study also conducts ROC curve analysis to show how to determine the optimal threshold for distinguishing the fake or real job description on the experimental dataset.
Published: 2024
Full Text: View/download PDF

15. A semantic-based model with a hybrid feature engineering process for accurate spam detection

Author: Chira N. Mohammed and Ayah M. Ahmed
Subjects: Spam detection, Feature engineering, TF-IDF, Word embeddings, Feature selection, SVM, Electrical engineering. Electronics. Nuclear engineering, TK1-9971, Information technology, T58.5-58.64
Abstract: Abstract Detecting spam emails is essential to maintaining the security and integrity of email communication. Existing research has made significant progress in developing effective spam detection models, but challenges remain in improving classification performance and adaptability to evolving spamming techniques. In this study, we propose a novel spam detection model with a comprehensive feature engineering approach that combines term frequency-inverse document frequency (TF-IDF) vectorizer and word embedding features to optimize the feature space. Our contribution lies in integrating semantic-based word embeddings, leveraging pre-existing knowledge to capture the semantic meaning of words and enhance the representation of email texts. To identify the most suitable word embedding technique for our model, we evaluated GloVe, Word2Vec, and FastText. GloVe was selected for its better performance, which is the result of its pre-training on a large and diverse text corpus. Furthermore, the model was evaluated without word embeddings, which did not exhibit the same effectiveness level as our word embedding-based model. Additionally, we utilized the support vector machine as a classifier and hyperparameter tuning technique to identify our model’s most effective parameter values. The proposed model was tested on two datasets. The experimental results showed that our model outperformed the other models discussed in the literature, achieving an accuracy of 99.5% on the SpamAssassin dataset, and 99.28% on the Enron-Spam dataset.
Published: 2024
Full Text: View/download PDF

16. An efficient learning based approach for automatic record deduplication with benchmark datasets

Author: M Ravikanth, Sampath Korra, Gowtham Mamidisetti, Maganti Goutham, and T. Bhaskar
Subjects: Record deduplication, Deep learning, Word embeddings, Long short term memory, Data integration, Medicine, Science
Abstract: Abstract With technological innovations, enterprises in the real world are managing every iota of data as it can be mined to derive business intelligence (BI). However, when data comes from multiple sources, it may result in duplicate records. As data is given paramount importance, it is also significant to eliminate duplicate entities towards data integration, performance and resource optimization. To realize reliable systems for record deduplication, late, deep learning could offer exciting provisions with a learning-based approach. Deep ER is one of the deep learning-based methods used recently for dealing with the elimination of duplicates in structured data. Using it as a reference model, in this paper, we propose a framework known as Enhanced Deep Learning-based Record Deduplication (EDL-RD) for improving performance further. Towards this end, we exploited a variant of Long Short Term Memory (LSTM) along with various attribute compositions, similarity metrics, and numerical and null value resolution. We proposed an algorithm known as Efficient Learning based Record Deduplication (ELbRD). The algorithm extends the reference model with the aforementioned enhancements. An empirical study has revealed that the proposed framework with extensions outperforms existing methods.
Published: 2024
Full Text: View/download PDF

17. Exif2Vec: A Framework to Ascertain Untrustworthy Crowdsourced Images Using Metadata.

Author: Umair, Muhammad, Bouguettaya, Athman, Lakhdari, Abdallah, Ouzzani, Mourad, and Liu, Yuyun
Subjects: TRUST, SOCIAL media, DISINFORMATION, SOCIAL context, MISINFORMATION
Abstract: In the context of social media, the integrity of images is often dubious. To tackle this challenge, we introduce Exif2Vec, a novel framework specifically designed to discover modifications in social media images. The proposed framework leverages an image's metadata to discover changes in an image. We use a service-oriented approach that considers discovery of changes in images as a service. A novel word-embedding-based approach is proposed to discover semantic inconsistencies in an image metadata that are reflective of the changes in an image. These inconsistencies are used to measure the severity of changes. The novelty of the approach resides in that it does not require the use of images to determine the underlying changes. We use a pretrained Word2Vec model to conduct experiments. The model is validated on two different fact-checked image datasets, i.e., images related to general context and a context-specific image dataset. Notably, our findings showcase the remarkable efficacy of our approach, yielding results of up to 80% accuracy. This underscores the potential of our framework. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Kvantitativna analiza razmerij med semantičnimi polji v slovenski pripovedni prozi dolgega 19. stoletja.

Author: Mandić, Lucija
Abstract: Copyright of Comparative Literature / Primerjalna Književnost is the property of Slovenian Comparative Literature Association and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

19. Prediction of vulnerability severity using vulnerability description with natural language processing and deep learning.

Author: Abdirahman, Abdullahi Ahmed, Hashi, Abdirahman Osman, Romo Rodriguez, Octavio Ernesto, and Elmi, Mohamed Abdirahman
Subjects: CONVOLUTIONAL neural networks, NATURAL language processing, NEURAL circuitry, COMPUTER software, CLASSIFICATION
Abstract: One of the most critical aspects of a software piece is its vulnerabilities. Regardless of the years of experience, type of project, or the size of the team, it is impossible to avoid introducing vulnerabilities while developing or maintaining software. This aspect becomes crucial when the software is deployed in production or released to the final users. At that point finding vulnerabilities becomes a race between the developers and malicious intruders, whoever finds it first can either exploit it or fix it. Acknowledging this situation and using the tools and standards that we have available in the field, such as common vulnerability exposures and common vulnerability scoring systems, and based on modern researches, in this study, we propose to have an approach different from the common practices of manual classification, using a 2-layer convolutional neuronal network (CNN) to automatize the classification of vulnerabilities, speeding up this process and enabling developers to have a faster response towards vulnerabilities, producing safer software. The experimental results obtained in this study suggest that pre-trained word embeddings contributed to an increase in accuracy of approximately 2% and the overall accuracy become 0.816%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. An efficient learning based approach for automatic record deduplication with benchmark datasets.

Author: Ravikanth, M, Korra, Sampath, Mamidisetti, Gowtham, Goutham, Maganti, and Bhaskar, T.
Abstract: With technological innovations, enterprises in the real world are managing every iota of data as it can be mined to derive business intelligence (BI). However, when data comes from multiple sources, it may result in duplicate records. As data is given paramount importance, it is also significant to eliminate duplicate entities towards data integration, performance and resource optimization. To realize reliable systems for record deduplication, late, deep learning could offer exciting provisions with a learning-based approach. Deep ER is one of the deep learning-based methods used recently for dealing with the elimination of duplicates in structured data. Using it as a reference model, in this paper, we propose a framework known as Enhanced Deep Learning-based Record Deduplication (EDL-RD) for improving performance further. Towards this end, we exploited a variant of Long Short Term Memory (LSTM) along with various attribute compositions, similarity metrics, and numerical and null value resolution. We proposed an algorithm known as Efficient Learning based Record Deduplication (ELbRD). The algorithm extends the reference model with the aforementioned enhancements. An empirical study has revealed that the proposed framework with extensions outperforms existing methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. A semantic-based model with a hybrid feature engineering process for accurate spam detection.

Author: Mohammed, Chira N. and Ahmed, Ayah M.
Subjects: SPAM email, SUPPORT vector machines, EMAIL security
Abstract: Detecting spam emails is essential to maintaining the security and integrity of email communication. Existing research has made significant progress in developing effective spam detection models, but challenges remain in improving classification performance and adaptability to evolving spamming techniques. In this study, we propose a novel spam detection model with a comprehensive feature engineering approach that combines term frequency-inverse document frequency (TF-IDF) vectorizer and word embedding features to optimize the feature space. Our contribution lies in integrating semantic-based word embeddings, leveraging pre-existing knowledge to capture the semantic meaning of words and enhance the representation of email texts. To identify the most suitable word embedding technique for our model, we evaluated GloVe, Word2Vec, and FastText. GloVe was selected for its better performance, which is the result of its pre-training on a large and diverse text corpus. Furthermore, the model was evaluated without word embeddings, which did not exhibit the same effectiveness level as our word embedding-based model. Additionally, we utilized the support vector machine as a classifier and hyperparameter tuning technique to identify our model's most effective parameter values. The proposed model was tested on two datasets. The experimental results showed that our model outperformed the other models discussed in the literature, achieving an accuracy of 99.5% on the SpamAssassin dataset, and 99.28% on the Enron-Spam dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Quantifying social capital creation in post‐disaster recovery aid in Indonesia: methodological innovation by an AI‐based language model.

Author: Marutschke, Daniel Moritz, Nurdin, Muhammad Riza, and Hirono, Miwa
Subjects: *LANGUAGE models, *ARTIFICIAL intelligence, *SOCIAL capital, *NATURAL language processing, *DISASTER relief, *ETHNOLOGY research, *DISASTER resilience
Abstract: Smooth interaction with a disaster‐affected community can create and strengthen its social capital, leading to greater effectiveness in the provision of successful post‐disaster recovery aid. To understand the relationship between the types of interaction, the strength of social capital generated, and the provision of successful post‐disaster recovery aid, intricate ethnographic qualitative research is required, but it is likely to remain illustrative because it is based, at least to some degree, on the researcher's intuition. This paper thus offers an innovative research method employing a quantitative artificial intelligence (AI)‐based language model, which allows researchers to re‐examine data, thereby validating the findings of the qualitative research, and to glean additional insights that might otherwise have been missed. This paper argues that well‐connected personnel and religiously‐based communal activities help to enhance social capital by bonding within a community and linking to outside agencies and that mixed methods, based on the AI‐based language model, effectively strengthen text‐based qualitative research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Context-aware composition of agent policies by Markov decision process entity embeddings and agent ensembles.

Author: Merkle, Nicole and Mikut, Ralf
Subjects: WEBSITES, KNOWLEDGE graphs, CYBER physical systems, MARKOV processes, CONTRAST media, REINFORCEMENT learning
Abstract: Computational agents support humans in many areas of life and are therefore found in heterogeneous contexts. This means that agents operate in rapidly changing environments and can be confronted with huge state and action spaces. In order to perform services and carry out activities satisfactorily, i.e. in a goal-oriented manner, agents require prior knowledge and therefore have to develop and pursue context-dependent policies. The problem here is that prescribing policies in advance is limited and inflexible, especially in dynamically changing environments. Moreover, the context (i.e. the external and internal state) of an agent determines its choice of actions. Since the environments in which agents operate can be stochastic and complex in terms of the number of states and feasible actions, activities are usually modelled in a simplified way by Markov decision processes so that, for example, agents with reinforcement learning are able to learn policies, i.e. state-action pairs, that help to capture the context and act accordingly to optimally perform activities. However, training policies for all possible contexts using reinforcement learning is time-consuming. A requirement and challenge for agents is to learn strategies quickly and respond immediately in cross-context environments and applications, e.g., the Internet, service robotics, cyber-physical systems. In this work, we propose a novel simulation-based approach that enables a) the representation of heterogeneous contexts through knowledge graphs and entity embeddings and b) the context-aware composition of policies on demand by ensembles of agents running in parallel. The evaluation we conducted with the "Virtual Home" dataset indicates that agents with a need to switch seamlessly between different contexts, e.g. in a home environment, can request on-demand composed policies that lead to the successful completion of context-appropriate activities without having to learn these policies in lengthy training steps and episodes, in contrast to agents that use reinforcement learning. The presented approach enables both context-aware and cross-context applicability of untrained computational agents. Furthermore, the source code of the approach as well as the generated data, i.e. the trained embeddings and the semantic representation of domestic activities, is open source and openly accessible on Github and Figshare. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Natural Language Processing for Ancient Greek: Design, advantages and challenges of language models.

Author: Stopponi, Silvia, Pedrazzini, Nilo, Peels-Matthey, Saskia, McGillivray, Barbara, and Nissim, Malvina
Subjects: NATURAL language processing, LANGUAGE models, COMPUTATIONAL linguistics, GREEK language, SEMANTICS
Abstract: Copyright of Diachronica is the property of John Benjamins Publishing Co. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

25. Malware Classification Using Dynamically Extracted API Call Embeddings.

Author: Aggarwal, Sahil and Di Troia, Fabio
Subjects: HIDDEN Markov models, COMPUTER security, MALWARE, CONVOLUTIONAL neural networks, SUPPORT vector machines, NATURAL language processing
Abstract: Malware classification stands as a crucial element in establishing robust computer security protocols, encompassing the segmentation of malware into discrete groupings. Recently, the emergence of machine learning has presented itself as an apt approach for addressing this challenge. Models can undergo training employing diverse malware attributes, such as opcodes and API calls, to distill valuable insights for effective classification. Within the realm of natural language processing, word embeddings assume a pivotal role by representing text in a manner that aligns closely with the proximity of similar words. These embeddings facilitate the quantification of word resemblances. This research embarks on a series of experiments that harness hybrid machine learning methodologies. We derive word vectors from dynamic API call logs associated with malware and integrate them as features in collaboration with diverse classifiers. Our methodology involves the utilization of Hidden Markov Models and Word2Vec to generate embeddings from API call logs. Additionally, we amalgamate renowned models like BERT and ELMo, noted for their capacity to yield contextualized embeddings. The resultant vectors are channeled into our classifiers, namely Support Vector Machines (SVMs), Random Forest (RF), k-Nearest Neighbors (kNNs), and Convolutional Neural Networks (CNNs). Through two distinct sets of experiments, our objective revolves around the classification of both malware families and categories. The outcomes achieved illuminate the efficacy of API call embeddings as a potent instrument in the domain of malware classification, particularly in the realm of identifying malware families. The best combination was RF and word embeddings generated by Word2Vec, ELMo, and BERT, achieving an accuracy between 0.91 and 0.93. This result underscores the potential of our approach in effectively classifying malware. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Semantic Non-Negative Matrix Factorization for Term Extraction.

Author: Nugumanova, Aliya, Alzhanov, Almas, Mansurova, Aiganym, Rakhymbek, Kamilla, and Baiburin, Yerzhan
Subjects: MATRIX decomposition, NONNEGATIVE matrices, DOCUMENT clustering, HISTORY of geography, TRANSFORMER models
Abstract: This study introduces an unsupervised term extraction approach that combines non-negative matrix factorization (NMF) with word embeddings. Inspired by a pioneering semantic NMF method that employs regularization to jointly optimize document–word and word–word matrix factorizations for document clustering, we adapt this strategy for term extraction. Typically, a word–word matrix representing semantic relationships between words is constructed using cosine similarities between word embeddings. However, it has been established that transformer encoder embeddings tend to reside within a narrow cone, leading to consistently high cosine similarities between words. To address this issue, we replace the conventional word–word matrix with a word–seed submatrix, restricting columns to 'domain seeds'—specific words that encapsulate the essential semantic features of the domain. Therefore, we propose a modified NMF framework that jointly factorizes the document–word and word–seed matrices, producing more precise encoding vectors for words, which we utilize to extract high-relevancy topic-related terms. Our modification significantly improves term extraction effectiveness, marking the first implementation of semantically enhanced NMF, designed specifically for the task of term extraction. Comparative experiments demonstrate that our method outperforms both traditional NMF and advanced transformer-based methods such as KeyBERT and BERTopic. To support further research and application, we compile and manually annotate two new datasets, each containing 1000 sentences, from the 'Geography and History' and 'National Heroes' domains. These datasets are useful for both term extraction and document classification tasks. All related code and datasets are freely available. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Comparative Analysis of Word Embeddings for Multiclass Cyberbullying Detection.

Author: Faraj, Azhi and Utku, Semih
Subjects: *CYBERBULLYING, *SOCIAL media, *DIGITAL communications, *MACHINE learning, *COMPARATIVE studies, *CONVOLUTIONAL neural networks
Abstract: Cyberbullying has emerged as a pervasive concern in modern society, particularly within social media platforms. This phenomenon encompasses employing digital communication to instill fear, threaten, harass, or harm individuals. Given the prevalence of social media in our lives, there is an escalating need for effective methods to detect and combat cyberbullying. This paper aims to explore the utilization of word embeddings and to discern the comparative effectiveness of trainable word embeddings, pre-trained word embeddings, and fine-tuned language models in multiclass cyberbullying detection. Distinguishing from previous binary classification methods, our research delves into nuanced multiclass detection. The exploration of word embeddings holds significant promise due to its ability to transform words into dense numerical vectors within a high-dimensional space. This transformation captures intricate semantic and syntactic relationships inherent in language, enabling machine learning (ML) algorithms to discern patterns that might signify cyberbullying. In contrast to previous research, this work delves beyond primary binary classification and centers on the nuanced realm of multiclass cyberbullying detection. The research employs diverse techniques, including convolutional neural networks and bidirectional long short-term memory, alongside well-known pre-trained models such as word2vec and bidirectional encoder representations from transformers (BERT). Moreover, traditional ML algorithms such as K-nearest neighbors, Random Forest, and Naïve Bayes are integrated to evaluate their performance vis-à-vis deep learning models. The findings underscore the promise of a fine-tuned BERT model on our dataset, yielding the most promising results in multiclass cyberbullying detection, and achieving the best-recorded accuracy of 85% on the dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Modified lesk algorithm for word sense disambiguation in Bengali.

Author: Das, Ratul, Pal, Alok Ranjan, and Saha, Diganta
Abstract: This article presents a novel approach towards solving the problem of Word Sense Disambiguation (WSD) for Bengali Text. The algorithm used in this work is a modification of Lesk Algorithm. In the original algorithm, the overlap between the “context bag” and the “sense bag” items from the lexical resource (WordNet) are calculated using word pair matching. In the current approach the overlap is calculated by adopting semantic similarity measure using the fastText subword embeddings. The approach can efficiently handle unknown wordforms and discover the latent semantics of words. Significant progress has been made in WSD for English and other European Languages. Indian languages like Bengali still pose a formidable challenge. The dataset used for the work is individual sentences from the Bengali Wikipedia which is a huge collection of Bengali text (96 K Webpages with 1700 K sentences), the Indo WordNet for Bengali language and Bengali Online Dictionary. The results of the experiments performed are promising. The target words which have semantically distinct synsets in the WordNet give a high F1 score. The F1 score achieved is 80% which is well over the baseline and shows significant improvement over the other knowledge-based approaches tried on low resource Indian languages. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. E-Commerce Fake Reviews Detection Using LSTM with Word2Vec Embedding.

Author: Raheem, Mafas and Yi Chien Chong
Subjects: ONLINE shopping, OPTIMAL stopping (Mathematical statistics), VECTOR data, CONSUMERS' reviews, CONSUMERS, DEEP learning
Abstract: Customer reviews inform potential buyers' decisions, but fake reviews in e-commerce can skew perceptions as customers may feel pressured to leave positive feedback. Detecting fake reviews in e-commerce platforms is a critical challenge, impacting online shopping and deceiving customers. Effective detection strategies, employing deep learning architectures and word embeddings, are essential to combat this issue. Specifically, the study presented in this paper employed a 1-layer Simple LSTM model, a 1D Convolutional model, and a combined CNN+LSTM model. These models were trained using different pre-trained word embeddings including Word2Vec, GloVe, FastText, and with Keras embeddings, to convert the text data into vector form. The models were evaluated based on accuracy and F1-score to provide a comprehensive measure of their performance. The results indicated that the Simple LSTM model with Word2Vec embeddings achieved an accuracy of nearly 91% and an F1-score of 0.9024, outperforming all other model-embedding combinations. The 1D convolutional model performed best without any embeddings, suggesting its ability to extract meaningful features from the raw text. The transformer-based models, BERT and DistilBERT, showed progressive learning but struggled with generalization, indicating the need for strategies such as early stopping, dropout, or regularization to prevent overfitting. Notably, the DistilBERT model consistently outperformed the LSTM model, achieving optimal performance with accuracy of 96% and an F1-score of 0.9639 using a batch size of 32 and a learning rate of 4.00E-05. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Measuring Meta-Interpretation.

Author: Bystranowski, Piotr and Tobia, Kevin
Subjects: NATURAL language processing, FEDERAL court decisions, LEGAL language, MACHINE learning, EMPIRICAL research
Abstract: American legal interpretation has taken an empirical turn. Courts and scholars use corpus linguistics, survey experiments, and machine learning to clarify meanings of legal texts. We introduce these developments in "issue-level interpretation," concerning interpretive theories' application to legal language. Empirical methods also inform "meta-interpretive" debate: Which interpretive theory do interpreters use; which have they used; and which should they use? We demonstrate the relevance of machine learning to these meta-interpretive debates with insights provided by a word embedding that we trained on a corpus of over 1.3 million U.S. federal court decisions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Governors in the Digital Era: Analyzing and Predicting Social Media Engagement Using Machine Learning during the COVID-19 Pandemic in Japan.

Author: Shady, Salama, Shoda, Vera Paola, and Kamihigashi, Takashi
Subjects: COVID-19 pandemic, DIGITAL technology, MACHINE learning, SOCIAL media, MICROBLOGS, GOVERNORS, CRISIS communication, SOCIAL media in business
Abstract: This paper presents a comprehensive analysis of the social media posts of prefectural governors in Japan during the COVID-19 pandemic. It investigates the correlation between social media activity levels, governors' characteristics, and engagement metrics. To predict citizen engagement of a specific tweet, machine learning models (MLMs) are trained using three feature sets. The first set includes variables representing profile- and tweet-related features. The second set incorporates word embeddings from three popular models, while the third set combines the first set with one of the embeddings. Additionally, seven classifiers are employed. The best-performing model utilizes the first feature set with FastText embedding and the XGBoost classifier. This study aims to collect governors' COVID-19-related tweets, analyze engagement metrics, investigate correlations with governors' characteristics, examine tweet-related features, and train MLMs for prediction. This paper's main contributions are twofold. Firstly, it offers an analysis of social media engagement by prefectural governors during the COVID-19 pandemic, shedding light on their communication strategies and citizen engagement outcomes. Secondly, it explores the effectiveness of MLMs and word embeddings in predicting tweet engagement, providing practical implications for policymakers in crisis communication. The findings emphasize the importance of social media engagement for effective governance and provide insights into factors influencing citizen engagement. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Emotion topology: extracting fundamental components of emotions from text using word embeddings

Author: Hubert Plisiecki and Adam Sobieszek
Subjects: word embeddings, emotion decomposition, natural language processing, valence, arousal, Psychology, BF1-990
Abstract: This exploratory study examined the potential of word embeddings, an automated numerical representation of written text, as a novel method for emotion decomposition analysis. Drawing from a substantial dataset scraped from a Social Media site, we constructed emotion vectors to extract the dimensions of emotions, as annotated by the readers of the texts, directly from human language. Our findings demonstrated that word embeddings yield emotional components akin to those found in previous literature, offering an alternative perspective not bounded by theoretical presuppositions, as well as showing that the dimensional structure of emotions is reflected in the semantic structure of their text-based expressions. Our study highlights word embeddings as a promising tool for uncovering the nuances of human emotions and comments on the potential of this approach for other psychological domains, providing a basis for future studies. The exploratory nature of this research paves the way for further development and refinement of this method, promising to enrich our understanding of emotional constructs and psychological phenomena in a more ecologically valid and data-driven manner.
Published: 2024
Full Text: View/download PDF

33. Automation Assemblages in the Internet of Things: Discovering Qualitative Practices at the Boundaries of Quantitative Change.

Author: Novak, Thomas P and Hoffman, Donna L
Subjects: CUSTOMER experience, INTERNET of things, AUTOMATION, NATURAL language processing, HUMAN-computer interaction, CONSUMER behavior
Abstract: We examine consumers' interactions with smart objects using a novel mixed-method approach, guided by assemblage theory, to discover the emergence of automation practices. We use a unique text data set from the web service IFTTT, ("If This Then That"), representing hundreds of thousands of applets that represent "if–then" connections between pairs of Internet services. Consumers use these applets to automate events in their daily lives. We quantitatively identify and qualitatively interpret automation assemblages that emerge bottom-up as different consumers create similar applets within unique social contexts. Our data discovery approach combines word embeddings, density-based clustering, and nonlinear dimensionality reduction with an inductive approach to the thematic analysis. We uncover 127 nested automation assemblages that correspond to automation practices. Practices are interpreted in terms of four higher-order categories: social expression, social connectedness, extended mind, and relational AI. To investigate the future trajectories of automation practices, we use the concept of the possibility space, a fundamental theoretical idea from assemblage theory. Using our empirical approach, we translate this theoretical possibility space of automation assemblages into a data visualization to predict how existing practices can grow and new practices can emerge. Our new approach makes conceptual, methodological, and empirical contributions with implications for consumer research and marketing strategy. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

34. A Novel Technique for the Early Diagnosis of Mental Health Using Natural Language Processing

Author: Upadhyay, Amitkumar, Varshney, Deepika, Rishabh, Vashishtha, Srishti, Jain, Dhruv, Meena, Jaishree, Khanna, Ashish, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Hassanien, Aboul Ella, editor, Anand, Sameer, editor, Jaiswal, Ajay, editor, and Kumar, Prabhat, editor
Published: 2024
Full Text: View/download PDF

35. News Headlines Sentiment Analysis Using Vectorization Techniques

Author: Roy, Manish Chandra, Bisoy, Sukant Kishoro, Das, Pradipta Kumar, Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, Tripathi, Ashish Kumar, editor, and Anand, Darpan, editor
Published: 2024
Full Text: View/download PDF

36. Evaluation of Gender Bias in Amharic Word Embedding Model

Author: Zenebe, Beimnet, Gizaw, Solomon, Abgaz, Yalemisew, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rapp, Amon, editor, Di Caro, Luigi, editor, Meziane, Farid, editor, and Sugumaran, Vijayan, editor
Published: 2024
Full Text: View/download PDF

37. Fundamentals of Vector-Based Text Representation and Word Embeddings

Author: Malik, Nidhi, Singh, Sanjeet, Biswas, Payal, Sharan, Aditi, Chakrabarti, Amlan, Series Editor, Becker, Jürgen, Editorial Board Member, Hu, Yu-Chen, Editorial Board Member, Chattopadhyay, Anupam, Editorial Board Member, Tribedi, Gaurav, Editorial Board Member, Saha, Sriparna, Editorial Board Member, Goswami, Saptarsi, Editorial Board Member, Sharan, Aditi, editor, Malik, Nidhi, editor, Imran, Hazra, editor, and Ghosh, Indira, editor
Published: 2024
Full Text: View/download PDF

38. Capturing Task-Related Information for Text-Based Grasp Classification Using Fine-Tuned Embeddings

Author: Kleer, Niko, Weyand, Leon, Feld, Michael, Berberich, Klaus, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nöth, Elmar, editor, Horák, Aleš, editor, and Sojka, Petr, editor
Published: 2024
Full Text: View/download PDF

39. Word Embedding-based Topic Modeling

Author: Bellaouar, Slimane, Itbirene, Ahmed, Chihani, Brahim, Luo, Xun, Editor-in-Chief, Almohammedi, Akram A., Series Editor, Chen, Chi-Hua, Series Editor, Guan, Steven, Series Editor, Pamucar, Dragan, Series Editor, Kerrache, Chaker Abdelaziz, editor, Tahari, Abdou El Karim, editor, Kassimi, Dounya, editor, and Chakraborty, Chinmay, editor
Published: 2024
Full Text: View/download PDF

40. Collection and Automatic Analysis with Natural Language Processing on a Corpus of Andean Oral Literature Implemented on the Web

Author: Solis, Ivan Soria, Buleje, Carlos Yinmel Castro, Reynaga, Humberto Silvera, Macedo, Mauro Felix Mamani, Soncco, Dionicia León, Guillen, Alejandro Giancarlo Mautino, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
Published: 2024
Full Text: View/download PDF

41. Using Similarity Based on Embeddings

Author: Alfaqeeh, Mosab, Skillicorn, David B., Alhajj, Reda, Series Editor, Glässer, Uwe, Series Editor, Aggarwal, Charu C., Advisory Editor, Brantingham, Patricia L., Advisory Editor, Gross, Thilo, Advisory Editor, Han, Jiawei, Advisory Editor, Manásevich, Raúl, Advisory Editor, Masys, Anthony J., Advisory Editor, Alfaqeeh, Mosab, and Skillicorn, David B.
Published: 2024
Full Text: View/download PDF

42. A Feature Selection Technique–Based Approach for Author Profiling Using Word Embedding Techniques

Author: Kavuri, Karunakar, Kavitha, M., Lin, Frank M., editor, Patel, Ashokkumar, editor, Kesswani, Nishtha, editor, and Sambana, Bosubabu, editor
Published: 2024
Full Text: View/download PDF

43. Sindhi POS Tagger Using LSTM and Pre-Trained Word Embeddings

Author: Nathani, Bharti, Arora, Palak, Joshi, Nisheeth, Katyayan, Pragya, Rathore, Shivani Singh, Dadlani, Chander Prakash, Lin, Frank M., editor, Patel, Ashokkumar, editor, Kesswani, Nishtha, editor, and Sambana, Bosubabu, editor
Published: 2024
Full Text: View/download PDF

44. Arabic News Articles Classification Using Different Word Embeddings

Author: Khaled, M. Moneb, Al-Barham, Muhammad, Alomari, Osama Ahmad, Elnagar, Ashraf, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, García Márquez, Fausto Pedro, editor, Jamil, Akhtar, editor, Hameed, Alaa Ali, editor, and Segovia Ramírez, Isaac, editor
Published: 2024
Full Text: View/download PDF

45. Topics

Author: Moreno-Ortiz, Antonio and Moreno-Ortiz, Antonio
Published: 2024
Full Text: View/download PDF

46. Combining Word Embeddings-Based Similarity Measures for Transfer Learning Across Relational Domains

Author: Luca, Thais, Paes, Aline, Zaverucha, Gerson, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Muggleton, Stephen H., editor, and Tamaddoni-Nezhad, Alireza, editor
Published: 2024
Full Text: View/download PDF

47. Static, Dynamic, or Contextualized: What is the Best Approach for Discovering Semantic Shifts in Russian Media?

Author: Nikonova, Veronika, Tikhonova, Maria, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ignatov, Dmitry I., editor, Khachay, Michael, editor, Kutuzov, Andrey, editor, Madoyan, Habet, editor, Makarov, Ilya, editor, Nikishina, Irina, editor, Panchenko, Alexander, editor, Panov, Maxim, editor, Pardalos, Panos M., editor, Savchenko, Andrey V., editor, Tsymbalov, Evgenii, editor, Tutubalina, Elena, editor, and Zagoruyko, Sergey, editor
Published: 2024
Full Text: View/download PDF

48. Creation of a Unique Clustering Method Employing Novel Similarity Metrics for Legal Texts to Improve Information Management and Retrieval in the Legal Field

Author: Jain, Rajanish Kumar, Jain, Anubha, Goel, Vikas, Dey, Nilanjan, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Piuri, Vincenzo, Series Editor, Mishra, Durgesh, editor, Yang, Xin She, editor, Unal, Aynur, editor, and Jat, Dharm Singh, editor
Published: 2024
Full Text: View/download PDF

49. MStoCast: Multimodal Deep Network for Stock Market Forecast

Author: Fataliyev, Kamaladdin, Liu, Wei, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Benavides-Prado, Diana, editor, Erfani, Sarah, editor, Fournier-Viger, Philippe, editor, Boo, Yee Ling, editor, and Koh, Yun Sing, editor
Published: 2024
Full Text: View/download PDF

50. A Study of Word Embedding Models for Machine Translation of North Eastern Languages

Author: Nath, Basab, Sarkar, Sunita, C. Debnath, Narayan, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Dasgupta, Kousik, editor, Mukhopadhyay, Somnath, editor, Mandal, Jyotsna K., editor, and Dutta, Paramartha, editor
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

2,386 results on '"Word Embeddings"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources