Descriptor: "Word2Vec" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Word2Vec"' showing total 4,400 results

Start Over Descriptor "Word2Vec"

4,400 results on '"Word2Vec"'

1. Incident Alert Priority Levels Classification in Command and Control Centre Using Word Embedding Techniques

Author: Orellana, Marcos, Cubero Lupercio, Jonnathan Emmanuel, Lima, Juan Fernando, García-Montero, Patricio Santiago, Zambrano-Martinez, Jorge Luis, Ghosh, Ashish, Editorial Board Member, Berrezueta-Guzman, Santiago, editor, Torres, Rommel, editor, Zambrano-Martinez, Jorge Luis, editor, and Herrera-Tapia, Jorge, editor
Published: 2025
Full Text: View/download PDF

2. Sentiment Analysis of Amazon Alexa Product Reviews: A Comprehensive Comparative Study of Learning Algorithms

Author: Rao, Gouravelli Akshith, Prakash, L. N. C. K., Suryanarayana, G., Joshua, Pathi Varun, Reddy, Katta Nithin Kumar, Karnati, Ramesh, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Kumar, Amit, editor, Gunjan, Vinit Kumar, editor, Senatore, Sabrina, editor, and Hu, Yu-Chen, editor
Published: 2025
Full Text: View/download PDF

3. Curating Reagents in Chemical Reaction Data with an Interactive Reagent Space Map

Author: Andronov, Mikhail, Andronova, Natalia, Wand, Michael, Schmidhuber, Jürgen, Clevert, Djork-Arné, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Clevert, Djork-Arné, editor, Wand, Michael, editor, Malinovská, Kristína, editor, Schmidhuber, Jürgen, editor, and Tetko, Igor V., editor
Published: 2025
Full Text: View/download PDF

4. Word Embeddings as Statistical Estimators.

Author: Dey, Neil, Singer, Matthew, Williams, Jonathan P., and Sengupta, Srijan
Abstract: Word embeddings are a fundamental tool in natural language processing. Currently, word embedding methods are evaluated on the basis of empirical performance on benchmark data sets, and there is a lack of rigorous understanding of their theoretical properties. This paper studies word embeddings from a statistical theoretical perspective, which is essential for formal inference and uncertainty quantification. We propose a copula-based statistical model for text data and show that under this model, the now-classical Word2Vec method can be interpreted as a statistical estimation method for estimating the theoretical pointwise mutual information (PMI). We further illustrate the utility of this statistical model by using it to develop a missing value-based estimator as a statistically tractable and interpretable alternative to the Word2Vec approach. The estimation error of this estimator is comparable to Word2Vec and improves upon the truncation-based method proposed by Levy and Goldberg (Adv. Neural Inf. Process. Syst., 27, 2177–2185 2014). The resulting estimator also is comparable to Word2Vec in a benchmark sentiment analysis task on the IMDb Movie Reviews data set and a part-of-speech tagging task on the OntoNotes data set. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Preservation of emotional context in tweet embeddings on social networking sites.

Author: Maruyama, Osamu, Yoshinaga, Asato, and Sawai, Ken-ichi
Abstract: In communication, emotional information is crucial, yet its preservation in tweet embeddings remains a challenge. This study aims to address this gap by exploring three distinct methods for generating embedding vectors of tweets: word2vec models, pre-trained BERT models, and fine-tuned BERT models. We conducted an analysis to assess the degree to which emotional information is conserved in the resulting embedding vectors. Our findings indicate that the fine-tuned BERT model exhibits a higher level of preservation of emotional information compared to other methods. These results underscore the importance of utilizing advanced natural language processing techniques for preserving emotional context in text data, with potential implications for enhancing sentiment analysis and understanding human communication in social media contexts. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. RecommendRift: a leap forward in user experience with transfer learning on netflix recommendations.

Author: Anuradha, Surabhi, Jyothi, Pothabathula Naga, Sivakumar, Surabhi, and Sheshikala, Martha
Subjects: WORD frequency, USER experience, RECREATION, RECOMMENDER systems
Abstract: In today's fast-paced lifestyle, streaming movies and series on platforms like Netflix is a valued recreational activity. However, users often spend considerable time searching for the right content and receive irrelevant recommendations, particularly when facing the "cold start problem" for new users. This challenge arises from existing recommender systems relying on factors like casting, title, and genre, using term frequency-inverse document frequency (TF-IDF) for vectorization, which prioritizes word frequency over semantic meaning. To address this, an innovative recommender system considering not only casting, title, and genre but also the short description of movies or shows is proposed in this study. Leveraging Word2Vec embedding for semantic relationships, this system offers recommendations aligning better with user preferences. Evaluation metrics including precision, mean average precision (MAP), discounted cumulative gain (DCG), and ideal cumulative gain (IDCG) demonstrate the system's effectiveness, achieving a normalized DCG (NDCG)@10 of 0.956. A/B testing shows an improved click-through rate (CTR) of recommendations, showcasing enhanced streaming experience. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines.

Author: Hussain, Sadam, Naseem, Usman, Ali, Mansoor, Avendaño Avalos, Daly Betzabeth, Cardona-Huerta, Servando, Bosques Palomo, Beatriz Alejandra, and Tamez-Peña, Jose Gerardo
Subjects: *LANGUAGE models, *GENERATIVE pre-trained transformers, *TRANSLATING & interpreting, *MACHINE learning, *SUPPORT vector machines, *DEEP learning, *NATURAL language processing
Abstract: Background: Recently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification. Results: The final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607). Conclusion: In this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Unveiling Similarities in the Code of Life: A Detailed Exploration of DNA Sequence Matching Algorithm.

Author: Shams, Mahmoud Y., Farag, Romany M., Aldawody, Dalia A., Khalid, Huda E., Essa, Ahmed K., El-Bakry, Hazem M., and Salama, A. A.
Subjects: *NUCLEOTIDE sequence, *ALGORITHMS, *BIOINFORMATICS, *DNA, *COSINE function
Abstract: Identifying similar DNA sequences is crucial in various biological research endeavors. This paper delves into the intricate workings of a specific algorithm designed for this purpose. We provide a systematic explanation, exploring how the algorithm handles user input, reads stored DNA sequences, utilizes the Word2Vec model for vector representation, and calculates sequence similarity using diverse metrics like Cosine Similarity and Neutrosophic Distance. Additionally, the paper explores the incorporation of neutrosophic values to account for uncertainty in the comparisons. Finally, we discuss the extraction of results, including matched sequences, similarity scores, and accuracy measures. This in-depth exploration provides a clear understanding of the algorithm's capabilities and fosters its effective application in DNA sequence analysis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. ONTO-TDM domain ontology population for a specific discipline.

Author: Abdoune, Rosana, Lazib, Lydia, Dahmani-Bouarab, Farida, and Fernández-Breis, Jesualdo Tomás
Abstract: Ontologies play a vital role in organizing and constructing knowledge across various domains, enabling effective knowledge management and sharing. The development of domain-specific ontologies, such as the ONTO-TDM ontology for teaching domain modeling, is essential for providing a comprehensive and standardized representation of knowledge within a given discipline. However, to maximize the usefulness and relevance of such ontologies, it is crucial to automate their population with domain-specific information, reducing manual work and ensuring scalability. This paper presents a novel method for ontology population by extracting and integrating relevant information from diverse sources. The method combines the TextRank algorithm with Word2Vec to enhance keyword extraction, capturing both semantic meaning and textual importance. Keywords are then annotated and used to train a machine learning classifier, which aids in integrating new instances into the ontology. Experiments show that the proposed method achieves a precision of 63.33%, a recall of 61.29% and an F1-score of 62.28%, significantly improving keyword extraction and ontology population accuracy compared to existing methods. This validates the method's effectiveness in semi-automatically extracting relevant instances from diverse data sources, enhancing the efficiency and accuracy of ontology population, and advancing automated knowledge management in domain-specific contexts. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Enhancing emotion detection with synergistic combination of word embeddings and convolutional neural networks.

Author: Jadon, Anil Kumar and Kumar, Suresh
Subjects: CONVOLUTIONAL neural networks, EMOTION recognition, DEEP learning, PSYCHIATRIC research, CONSUMER research
Abstract: Recognizing emotions in textual data is crucial in a wide range of natural language processing (NLP) applications, from consumer sentiment research to mental health evaluation. The word embedding techniques play a pivotal role in text processing. In this paper, the performance of several well-known word embedding methods is evaluated in the context of emotion recognition. The classification of emotions is further enhanced using a convolutional neural network (CNN) model because of its propensity to capture local patterns and its recent triumphs in text-related tasks. The integration of CNN with word embedding techniques introduced an additional layer to the landscape of emotion detection from text. The synergy between word embedding techniques and CNN harnesses the strengths of both approaches. CNNs extract local patterns and features from sequential data, making them well-suited for capturing relevant information within the embeddings. The results obtained with various embeddings highlight the significance of choosing synergistic combinations for optimum performance. The combination of CNNs and word embeddings proved a versatile and effective approach. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines

Author: Sadam Hussain, Usman Naseem, Mansoor Ali, Daly Betzabeth Avendaño Avalos, Servando Cardona-Huerta, Beatriz Alejandra Bosques Palomo, and Jose Gerardo Tamez-Peña
Subjects: BI-RADS classification, Breast radiological reports, TF-IDF, Word2vec, NLP, ML, Computer applications to medicine. Medical informatics, R858-859.7
Abstract: Abstract Background Recently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports’ classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification. Results The final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607). Conclusion In this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.
Published: 2024
Full Text: View/download PDF

12. Developing and testing the efficacy of a novel forecasting methodology: Theory and evidence from China.

Author: Yang, Yuhong, Dogru, Tarik, Liang, Chao, Wang, Jianqiong, and Xu, Pengfei
Subjects: FORECASTING methodology, DEMAND forecasting, PRINCIPAL components analysis, PREDICTION theory, TOURIST attractions
Abstract: Numerous methodologies have been offered to forecast tourism demand; however, accurate forecasting has been a major challenge for policymakers despite its critical importance for tourism planning. Therefore, we propose and test a novel forecasting methodology that combines principal component analysis (PCA) and long short-term memory (LSTM) network, along with the Baidu index, to forecast daily tourist arrivals for a popular tourist attraction in China. Word2Vec, a software tool launched by Google, is used to improve the coverage and accuracy of search keywords in the construction of the Baidu indexes. Before training the LSTM network, PCA is used to reduce noise and optimize the data. Considering the study's timeframe, the impact of COVID-19 pandemic has also been assessed. The efficacy of the proposed forecasting methodology is verified, and the results show that the PCA-LSTM model outperforms other models in terms of prediction accuracy and stability. Theoretical and practical implications are discussed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Analysis of the impact of the contextual embeddings usage on the text classification accuracy

Author: Olesia Barkovska, Anton Havrashenko, Vitalii Serdechnyi, Vladyslav Kholiev, and Patrik Rusnak
Subjects: classification, nlp, context, model, neural network, word2vec, glove, embedding, bert, gpt, Computer engineering. Computer hardware, TK7885-7895, Electronic computers. Computer science, QA75.5-76.95
Abstract: Analysis of the impact of the contextual embeddings usage on the text classification accuracy
Published: 2024
Full Text: View/download PDF

14. Sentiment-semantic word vectors: A new method to estimate management sentiment

Author: Tri Minh Phan
Subjects: Knowledge distillation, MD& A, Stock return predictability, Word2Vec, Statistics, HA1-4737, Economics as a science, HB71-74
Abstract: Abstract This paper introduces a novel method to extract the sentiment embedded in the Management’s Discussion and Analysis (MD &A) section of 10-K filings. The proposed method outperforms traditional approaches in terms of sentiment classification accuracy. Utilizing this method, the MD &A sentiment is found to be a strong negative predictor of future stock returns, demonstrating consistency in both in-sample and out-of-sample settings. By contrast, if traditional sentiment extraction methods are used, the MD &A sentiment exhibits no predictive ability for stock markets. Additionally, the MD &A sentiment is associated with dividend-related macroeconomic channels regarding future stock return prediction.
Published: 2024
Full Text: View/download PDF

15. Discrimination of semantically similar verbal memory traces is affected in healthy aging

Author: Alex Ilyés, Borbála Paulik, and Attila Keresztes
Subjects: Mnemonic discrimination, Aging, Semantic similarity, Pattern separation, word2vec, Medicine, Science
Abstract: Abstract Mnemonic discrimination of highly similar memory traces is affected in healthy aging via changes in hippocampal pattern separation—i.e., the ability of the hippocampus to orthogonalize highly similar neural inputs. The decline of this process leads to a loss of episodic specificity. Because previous studies have almost exclusively tested mnemonic discrimination of visuospatial stimuli (e.g., objects or scenes), less is known about age-related effects on the episodic specificity of semantically similar traces. To address this gap, we designed a task to assess mnemonic discrimination of verbal stimuli as a function of semantic similarity based on word embeddings. Forty young (Mage = 21.7 years) and 40 old adults (Mage = 69.8 years) first incidentally encoded adjective-noun phrases, then performed a surprise recognition test involving exactly repeated and highly similar lure phrases. We found that increasing semantic similarity negatively affected mnemonic discrimination in both age groups, and that compared to young adults, older adults showed worse discrimination at medium levels of semantic similarity. These results indicate that episodic specificity of semantically similar memory traces is affected in aging via less efficient mnemonic operations and strengthen the notion that mnemonic discrimination is a modality-independent process supporting memory specificity across representational domains.
Published: 2024
Full Text: View/download PDF

16. Optimizing word embeddings for small datasets: a case study on patient portal messages from breast cancer patients

Author: Qingyuan Song, Congning Ni, Jeremy L. Warner, Qingxia Chen, Lijun Song, S. Trent Rosenbloom, Bradley A. Malin, and Zhijun Yin
Subjects: Breast cancer, Hormonal therapy, Natural language processing, Patient portal messages, Word embedding models, Word2vec, Medicine, Science
Abstract: Abstract Patient portal messages often relate to specific clinical phenomena (e.g., patients undergoing treatment for breast cancer) and, as a result, have received increasing attention in biomedical research. These messages require natural language processing and, while word embedding models, such as word2vec, have the potential to extract meaningful signals from text, they are not readily applicable to patient portal messages. This is because embedding models typically require millions of training samples to sufficiently represent semantics, while the volume of patient portal messages associated with a particular clinical phenomenon is often relatively small. We introduce a novel adaptation of the word2vec model, PK-word2vec (where PK stands for prior knowledge), for small-scale messages. PK-word2vec incorporates the most similar terms for medical words (including problems, treatments, and tests) and non-medical words from two pre-trained embedding models as prior knowledge to improve the training process. We applied PK-word2vec in a case study of patient portal messages in the Vanderbilt University Medical Center electric health record system sent by patients diagnosed with breast cancer from December 2004 to November 2017. We evaluated the model through a set of 1000 tasks, each of which compared the relevance of a given word to a group of the five most similar words generated by PK-word2vec and a group of the five most similar words generated by the standard word2vec model. We recruited 200 Amazon Mechanical Turk (AMT) workers and 7 medical students to perform the tasks. The dataset was composed of 1389 patient records and included 137,554 messages with 10,683 unique words. Prior knowledge was available for 7981 non-medical and 1116 medical words. In over 90% of the tasks, both reviewers indicated PK-word2vec generated more similar words than standard word2vec (p = 0.01).The difference in the evaluation by AMT workers versus medical students was negligible for all comparisons of tasks’ choices between the two groups of reviewers ( $${\text{p}} = 0.774$$ p = 0.774 under a paired t-test). PK-word2vec can effectively learn word representations from a small message corpus, marking a significant advancement in processing patient portal messages.
Published: 2024
Full Text: View/download PDF

17. Sentiment analysis of the Hamas-Israel war on YouTube comments using deep learning

Author: Ashagrew Liyih, Shegaw Anagaw, Minichel Yibeyin, and Yitayal Tehone
Subjects: Deep learning approach, Recurrent neural network, Sentiment analysis, Word2vec, FastText, GloVe, Medicine, Science
Abstract: Abstract Sentiment analysis aims to classify text based on the opinion or mentality expressed in a situation, which can be positive, negative, or neutral. Therefore, in the world, a lot of opinions are available on various social media sites, which must be gathered and analyzed to assess the general public’s opinion. Finding and monitoring comments, as well as manually extracting the information contained in them, is a difficult task due to the vast diversity of ideas on YouTube. Identifying public opinion on war topics is crucial for offering insights to opposing sides based on popular opinion and emotions about the ongoing war. To address the gap, we build a model on YouTube comment sentiment analysis of the Hamas-Israel war to determine public opinion. In this study, we address the gaps by developing a deep learning-based approach for sentiment analysis. We have collected 24,360 comments from popular YouTube News Channels including BBC, WION, Aljazeera, and others about the Hamas-Israel War using YouTube API and Google spreadsheet and labeled them by linguistic experts into three classes: positive, negative, and neutral. Then, textual comments were preprocessed using natural language processing (NLP) techniques, and features were extracted using Word2vec, FastText, and GloVe. Moreover, we have used the SMOTE data balancing technique and used different data splits, but the 80/20 train-test split ratio has the highest accuracy. For classification model building, commonly used classification algorithms LSTM, Bi-LSTM, GRU, and Hybrid of CNN and Bi-LSTM were applied, and their performance is compared. As a result, the Hybrid of CNN and Bi-LSTM with Word2vec achieved the highest performance with 95.73% accuracy for comments classifications.
Published: 2024
Full Text: View/download PDF

18. Protection of Guizhou Miao batik culture based on knowledge graph and deep learning

Author: Huafeng Quan, Yiting Li, Dashuai Liu, and Yue Zhou
Subjects: Guizhou Miao batik, Cultural heritage, Knowledge graph, ResNet34, Word2vec, Fine Arts, Analytical chemistry, QD71-142
Abstract: Abstract In the globalization trend, China’s cultural heritage is in danger of gradually disappearing. The protection and inheritance of these precious cultural resources has become a critical task. This paper focuses on the Miao batik culture in Guizhou Province, China, and explores the application of knowledge graphs, natural language processing, and deep learning techniques in the promotion and protection of batik culture. We propose a dual-channel mechanism that integrates semantic and visual information, aiming to connect batik pattern features with cultural connotations. First, we use natural language processing techniques to automatically extract batik-related entities and relationships from the literature, and construct and visualize a structured batik pattern knowledge graph. Based on this knowledge graph, users can textually search and understand the images, meanings, taboos, and other cultural information of specific patterns. Second, for the batik pattern classification, we propose an improved ResNet34 model. By embedding average pooling and convolutional operations into the residual blocks and introducing long-range residual connections, the classification performance is enhanced. By inputting pattern images into this model, their categories can be accurately identified, and then the underlying cultural connotations can be understood. Experimental results show that our model outperforms other mainstream models in evaluation metrics such as accuracy, precision, recall, and F1-score, achieving 94.46%, 94.47%, 93.62%, and 93.8%, respectively. This research provides new ideas for the digital protection of batik culture and demonstrates the great potential of artificial intelligence technology in cultural heritage protection.
Published: 2024
Full Text: View/download PDF

19. An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model

Author: Yufang Zhang, Jiayi Li, Shenggeng Lin, Jianwei Zhao, Yi Xiong, and Dong-Qing Wei
Subjects: Compound-protein interactions, Graph convolutional network, End-to-end learning, word2vec, Information technology, T58.5-58.64, Chemistry, QD1-999
Abstract: Abstract Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope of candidates for traditional labor-intensive, time-consuming and expensive experimental techniques. In this study, we proposed an end-to-end approach termed SPVec-SGCN-CPI, which utilized simplified graph convolutional network (SGCN) model with low-dimensional and continuous features generated from our previously developed model SPVec and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three datasets and compared against four machine learning- and deep learning-based methods, as well as six state-of-the-art methods. Experimental results revealed that SPVec-SGCN-CPI outperformed all these competing methods, particularly excelling in unbalanced data scenarios. By propagating node features and topological information to the feature space, SPVec-SGCN-CPI effectively incorporates interactions between compounds and proteins, enabling the fusion of heterogeneity. Furthermore, our method scored all unlabeled data in ChEMBL, confirming the top five ranked compound-protein interactions through molecular docking and existing evidence. These findings suggest that our model can reliably uncover compound-protein interactions within unlabeled compound-protein pairs, carrying substantial implications for drug re-profiling and discovery. In summary, SPVec-SGCN demonstrates its efficacy in accurately predicting compound-protein interactions, showcasing potential to enhance target identification and streamline drug discovery processes. Scientific contributions The methodology presented in this work not only enables the comparatively accurate prediction of compound-protein interactions but also, for the first time, take sample imbalance which is very common in real world and computation efficiency into consideration simultaneously, accelerating the target identification and drug discovery process.
Published: 2024
Full Text: View/download PDF

20. Sentiment-semantic word vectors: A new method to estimate management sentiment.

Author: Phan, Tri Minh
Subjects: RATE of return on stocks, NEW words, STOCKS (Finance), CLASSIFICATION, FORECASTING
Abstract: This paper introduces a novel method to extract the sentiment embedded in the Management's Discussion and Analysis (MD &A) section of 10-K filings. The proposed method outperforms traditional approaches in terms of sentiment classification accuracy. Utilizing this method, the MD &A sentiment is found to be a strong negative predictor of future stock returns, demonstrating consistency in both in-sample and out-of-sample settings. By contrast, if traditional sentiment extraction methods are used, the MD &A sentiment exhibits no predictive ability for stock markets. Additionally, the MD &A sentiment is associated with dividend-related macroeconomic channels regarding future stock return prediction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Discrimination of semantically similar verbal memory traces is affected in healthy aging.

Author: Ilyés, Alex, Paulik, Borbála, and Keresztes, Attila
Subjects: *VERBAL memory, *OLDER people, *VERBAL learning, *AGE discrimination, *AGE groups, *YOUNG adults
Abstract: Mnemonic discrimination of highly similar memory traces is affected in healthy aging via changes in hippocampal pattern separation—i.e., the ability of the hippocampus to orthogonalize highly similar neural inputs. The decline of this process leads to a loss of episodic specificity. Because previous studies have almost exclusively tested mnemonic discrimination of visuospatial stimuli (e.g., objects or scenes), less is known about age-related effects on the episodic specificity of semantically similar traces. To address this gap, we designed a task to assess mnemonic discrimination of verbal stimuli as a function of semantic similarity based on word embeddings. Forty young (Mage = 21.7 years) and 40 old adults (Mage = 69.8 years) first incidentally encoded adjective-noun phrases, then performed a surprise recognition test involving exactly repeated and highly similar lure phrases. We found that increasing semantic similarity negatively affected mnemonic discrimination in both age groups, and that compared to young adults, older adults showed worse discrimination at medium levels of semantic similarity. These results indicate that episodic specificity of semantically similar memory traces is affected in aging via less efficient mnemonic operations and strengthen the notion that mnemonic discrimination is a modality-independent process supporting memory specificity across representational domains. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Construction and Validation of a Computerized Creativity Assessment Tool With Automated Scoring Based on Deep-Learning Techniques.

Author: Sung, Yao-Ting, Cheng, Hao-Hsin, Tseng, Hou-Chiang, Chang, Kuo-En, and Lin, Shu-Yen
Subjects: *DIVERGENT thinking, *CREATIVE ability, *STATISTICAL reliability, *CHINESE language, *TEST validity
Abstract: Based on the divergent thinking (DT) framework of creativity assessment, this study constructed the Computerized Creativity Assessment with Figure Test (C-CRAFT) that is equipped with an automated scoring system and built around a deep-learning-based semantic space model called Word2Vec. A subject pool of 493 undergraduates completed the C-CRAFT as well as a conventional paper-and-pencil DT test that required manual scoring. We found moderately high to high coefficients for the correlations between the two tests, which suggested that the C-CRAFT has strong criterion-related validity. The results of the pre- and posttests also demonstrated the high test–retest reliability of the C-CRAFT. Good discriminant validity was evidenced by highly significant differences in the C-CRAFT scores between college students from art and design-related fields and students from other majors. These research findings indicate that the C-CRAFT is a valid and reliable assessment tool for DT, while the automated nature of the C-CRAFT makes it easier to implement the DT test compared with traditional approaches. Moreover, by applying the C-CRAFT to the Chinese language, this study contributes to the cross-linguistic research of semantic models in creativity assessment. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. An innovative framework for supporting content-based authorship identification and analysis in social media networks.

Author: Puerta, José Gaviria de la, Pastor-López, Iker, Tellaeche, Alberto, Sanz, Borja, Sanjurjo-González, Hugo, Cuzzocrea, Alfredo, and Bringas, Pablo G
Subjects: SOCIAL media, SOCIAL network analysis, ONLINE social networks, AUTHORSHIP, SCIENTIFIC community, SOCIAL networks, USER-generated content
Abstract: Content-based authorship identification is an emerging research problem in online social media networks, due to a wide collection of issues ranging from security to privacy preservation, from radicalization to defamation detection, and so forth. Indeed, this research has attracted a relevant amount of attention from the research community during the past years. The general problem becomes harder when we consider the additional constraint of identifying the same false profile over different social media networks, under obvious considerations. Inspired by this emerging research challenge, in this paper we propose and experimentally assess an innovative framework for supporting content-based authorship identification and analysis in social media networks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. A proposed semantic keywords search engine for Indonesian Qur'an translation based on word embedding.

Author: Trisnawati, Liza, Binti Samsudin, Noor Azah, Bin Ahmad Khalid, Shamsul Kamal, Bin Ahmad Shaubari, Ezak Fadzrin, Sukri, and Indra, Zul
Subjects: KEYWORD searching, MACHINE translating, SEARCH engines, TRANSLATING & interpreting
Abstract: Obtaining relevant information from the Holy Qur'an can be really challenging for people who cannot speak Arabic, such as the Indonesian people. One technology implementation which is commonly used to tackle this problem is to develop a search engine application for Al-Qur'an verses. This paper proposes a search engine based on semantic representation keywords for the Indonesian translation of the Al-Qur'an which consists of 3 phases i.e., data preparation, document representation, and search engine development. In the first stage, the Al-Qur'an dataset was built using the official translation of the Al-Qur'an from the Ministry of Religion and then enriched with the Wikipedia corpus. The second phase is document representation which produces feature vectors by utilizing the Word2Vec algorithm. Finally, the development of a search engine that can find the most relevant verses by calculating the cosine similarity between the document and the keywords. It was found that the proposed search engine succeeded in exceeding the performance of ordinary search engines by finding wider information due to the use of semantic keywords. Apart from that, the proposed search engine succeeded in maintaining the relevance of search results by achieving precision and recall levels of 98.7% and 97.3% respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. The evolution of China's English education policy and challenges in higher education: analysis based on LDA and Word2Vec.

Author: Haiyang Hu, Fan Li, and Zhengying Luo
Subjects: EDUCATION policy, ENGLISH language, ONLINE education, BELT & Road Initiative, CREATIVE ability, HIGHER education, LANGUAGE ability
Abstract: China proposed the Belt and Road Initiative to strengthen regional connectivity so as to embrace a brighter future together. Since the Initiative was put forward, it has brought many challenges to China's English education policy. By employing Latent Dirichlet Allocation (LDA) and Word2Vec, this study analyzes the evolution of topics and challenges in China's English education policy under the Belt and Road Initiative. The results indicate that after the initiative, the policy focus has changed. English education has shifted from testing abilities to cultivating students' intercultural communication skills in order to meet the needs with countries alongside the "Belt and Road". Moreover, teaching strategies that were examination-oriented have also changed to emphasizing teaching methods and feedback. The focus and teaching strategies have also undergone great changes. China's English education policy has shifted from focusing on improving students' writing skills, English proficiency, and creativity to conducting in-depth research and addressing specific issues, including challenges in linguistics, media influence, educational institutions and programs, online courses, attitudes and self-efficacy, use of multiple languages and globalization, teaching issues, and curriculum design. These findings shed light on how the Belt and Road Initiative changed China's English education policy and provide further directions for future research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding.

Author: Jain, Minni, Jindal, Rajni, and Jain, Amita
Subjects: *NATURAL language processing, *SOCIAL media, *FUZZY graphs, *SENTIMENT analysis, *SPELLING errors
Abstract: Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Effectiveness of ELMo embeddings, and semantic models in predicting review helpfulness.

Author: Malik, Muhammad Shahid Iqbal, Nawaz, Aftab, Jamjoom, Mona Mamdouh, and Ignatov, Dmitry I.
Subjects: *LANGUAGE models, *PRODUCT reviews, *CONSUMERS' reviews, *ONLINE shopping, *FEATURE selection, *VIDEO games
Abstract: Online product reviews (OPR) are a commonly used medium for consumers to communicate their experiences with products during online shopping. Previous studies have investigated the helpfulness of OPRs using frequency-based, linguistic, meta-data, readability, and reviewer attributes. In this study, we explored the impact of robust contextual word embeddings, topic, and language models in predicting the helpfulness of OPRs. In addition, the wrapper-based feature selection technique is employed to select effective subsets from each type of features. Five feature generation techniques including word2vec, FastText, Global Vectors for Word Representation (GloVe), Latent Dirichlet Allocation (LDA), and Embeddings from Language Models (ELMo), were employed. The proposed framework is evaluated on two Amazon datasets (Video games and Health & personal care). The results showed that the ELMo model outperformed the six standard baselines, including the fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model. In addition, ELMo achieved Mean Square Error (MSE) of 0.0887 and 0.0786 respectively on two datasets and MSE of 0.0791 and 0.0708 with the wrapper method. This results in the reduction of 1.43% and 1.63% in MSE as compared to the fine-tuned BERT model on respective datasets. However, the LDA model has a comparable performance with the fine-tuned BERT model but outperforms the other five baselines. The proposed framework demonstrated good generalization abilities by uncovering important factors of product reviews and can be evaluated on other voting platforms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Malware Classification Using Dynamically Extracted API Call Embeddings.

Author: Aggarwal, Sahil and Di Troia, Fabio
Subjects: HIDDEN Markov models, COMPUTER security, MALWARE, CONVOLUTIONAL neural networks, SUPPORT vector machines, NATURAL language processing
Abstract: Malware classification stands as a crucial element in establishing robust computer security protocols, encompassing the segmentation of malware into discrete groupings. Recently, the emergence of machine learning has presented itself as an apt approach for addressing this challenge. Models can undergo training employing diverse malware attributes, such as opcodes and API calls, to distill valuable insights for effective classification. Within the realm of natural language processing, word embeddings assume a pivotal role by representing text in a manner that aligns closely with the proximity of similar words. These embeddings facilitate the quantification of word resemblances. This research embarks on a series of experiments that harness hybrid machine learning methodologies. We derive word vectors from dynamic API call logs associated with malware and integrate them as features in collaboration with diverse classifiers. Our methodology involves the utilization of Hidden Markov Models and Word2Vec to generate embeddings from API call logs. Additionally, we amalgamate renowned models like BERT and ELMo, noted for their capacity to yield contextualized embeddings. The resultant vectors are channeled into our classifiers, namely Support Vector Machines (SVMs), Random Forest (RF), k-Nearest Neighbors (kNNs), and Convolutional Neural Networks (CNNs). Through two distinct sets of experiments, our objective revolves around the classification of both malware families and categories. The outcomes achieved illuminate the efficacy of API call embeddings as a potent instrument in the domain of malware classification, particularly in the realm of identifying malware families. The best combination was RF and word embeddings generated by Word2Vec, ELMo, and BERT, achieving an accuracy between 0.91 and 0.93. This result underscores the potential of our approach in effectively classifying malware. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Using word embedding to detect keywords in texts modeled as complex networks.

Author: Tohalino, Jorge A. V., Silva, Thiago C., and Amancio, Diego R.
Abstract: Detecting keywords in texts is a task of paramount importance for many text mining applications. Graph-based techniques have been commonly used to automatically find the key concepts in texts. However, the integration of valuable information provided by embeddings to enrich the graph structure has not been widely used. In this context, this paper aims to address the following question: can the quality of extracted keywords from a co-occurrence network be enhanced by integrating embeddings to enrich the network structure? In the adopted model, texts are represented as co-occurrence networks, where nodes are words and edges are established either by contextual or semantical similarity. Two embedding approaches were used: Word2vec and Bidirectional Encoder Representations from Transformers (BERT). The results indicate that using virtual edges can effectively enhance the discriminative capacity of co-occurrence networks. The best performance was achieved by incorporating a limited proportion of virtual (embedding) edges. A comparison of the structural and dynamical network metrics demonstrated that the degree, PageRank, and accessibility metrics exhibited superior performance in the proposed model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. 基于 Word2Vec 和 LDA 主题模型的中国省级五年规划 “文化政策” 文本研究.

Author: 高娜 and 东梅
Abstract: Copyright of Cyber Security & Data Governance is the property of Editorial Office of Information Technology & Network Security and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

31. Cloud-based machine learning algorithms for anomalies detection.

Author: Amarnath, Raveendra N. and Gurulakshmanan, Gurumoorthi
Subjects: MACHINE learning, NATURAL language processing, RECURRENT neural networks, BAYESIAN analysis, COMPUTER passwords, MACHINE translating, RECOMMENDER systems
Abstract: Gradient boosting machines harnesses the inherent capabilities of decision trees and meticulously corrects their errors in a sequential fashion, culminating in remarkably precise predictions. Word2Vec, a prominent word embedding technique, occupies a pivotal role in natural language processing (NLP) tasks. Its proficiency lies in capturing intricate semantic relationships among words, thereby facilitating applications such as sentiment analysis, document classification, and machine translation to discern subtle nuances present in textual data. Bayesian networks introduce probabilistic modeling capabilities, predominantly in contexts marked by uncertainty. Their versatile applications encompass risk assessment, fault diagnosis, and recommendation systems. Gated recurrent units (GRU), a variant of recurrent neural networks, emerges as a formidable asset in modeling sequential data. Both training and testing are crucial to the success of an intrusion detection system (IDS). During the training phase, several models are created, each of which can recognize typical from anomalous patterns within a given dataset. To acquire passwords and credit card details, "phishing" usually entails impersonating a trusted company. Predictions of student performance on academic tasks are improved by hyper parameter optimization of the gradient boosting regression tree using the grid search approach. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Protection of Guizhou Miao batik culture based on knowledge graph and deep learning.

Author: Quan, Huafeng, Li, Yiting, Liu, Dashuai, and Zhou, Yue
Subjects: *KNOWLEDGE graphs, *DEEP learning, *NATURAL language processing, *TABOO, *BATIK, *ARTIFICIAL intelligence, *AUTOMATIC classification, *CULTURAL intelligence
Abstract: In the globalization trend, China's cultural heritage is in danger of gradually disappearing. The protection and inheritance of these precious cultural resources has become a critical task. This paper focuses on the Miao batik culture in Guizhou Province, China, and explores the application of knowledge graphs, natural language processing, and deep learning techniques in the promotion and protection of batik culture. We propose a dual-channel mechanism that integrates semantic and visual information, aiming to connect batik pattern features with cultural connotations. First, we use natural language processing techniques to automatically extract batik-related entities and relationships from the literature, and construct and visualize a structured batik pattern knowledge graph. Based on this knowledge graph, users can textually search and understand the images, meanings, taboos, and other cultural information of specific patterns. Second, for the batik pattern classification, we propose an improved ResNet34 model. By embedding average pooling and convolutional operations into the residual blocks and introducing long-range residual connections, the classification performance is enhanced. By inputting pattern images into this model, their categories can be accurately identified, and then the underlying cultural connotations can be understood. Experimental results show that our model outperforms other mainstream models in evaluation metrics such as accuracy, precision, recall, and F1-score, achieving 94.46%, 94.47%, 93.62%, and 93.8%, respectively. This research provides new ideas for the digital protection of batik culture and demonstrates the great potential of artificial intelligence technology in cultural heritage protection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Sentiment analysis of the Hamas-Israel war on YouTube comments using deep learning.

Author: Liyih, Ashagrew, Anagaw, Shegaw, Yibeyin, Minichel, and Tehone, Yitayal
Abstract: Sentiment analysis aims to classify text based on the opinion or mentality expressed in a situation, which can be positive, negative, or neutral. Therefore, in the world, a lot of opinions are available on various social media sites, which must be gathered and analyzed to assess the general public’s opinion. Finding and monitoring comments, as well as manually extracting the information contained in them, is a difficult task due to the vast diversity of ideas on YouTube. Identifying public opinion on war topics is crucial for offering insights to opposing sides based on popular opinion and emotions about the ongoing war. To address the gap, we build a model on YouTube comment sentiment analysis of the Hamas-Israel war to determine public opinion. In this study, we address the gaps by developing a deep learning-based approach for sentiment analysis. We have collected 24,360 comments from popular YouTube News Channels including BBC, WION, Aljazeera, and others about the Hamas-Israel War using YouTube API and Google spreadsheet and labeled them by linguistic experts into three classes: positive, negative, and neutral. Then, textual comments were preprocessed using natural language processing (NLP) techniques, and features were extracted using Word2vec, FastText, and GloVe. Moreover, we have used the SMOTE data balancing technique and used different data splits, but the 80/20 train-test split ratio has the highest accuracy. For classification model building, commonly used classification algorithms LSTM, Bi-LSTM, GRU, and Hybrid of CNN and Bi-LSTM were applied, and their performance is compared. As a result, the Hybrid of CNN and Bi-LSTM with Word2vec achieved the highest performance with 95.73% accuracy for comments classifications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model.

Author: Zhang, Yufang, Li, Jiayi, Lin, Shenggeng, Zhao, Jianwei, Xiong, Yi, and Wei, Dong-Qing
Subjects: *LANGUAGE models, *DRUG discovery, *DEEP learning, *SCIENTIFIC method, *END-to-end delay, *MOLECULAR docking, *FORECASTING
Abstract: Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope of candidates for traditional labor-intensive, time-consuming and expensive experimental techniques. In this study, we proposed an end-to-end approach termed SPVec-SGCN-CPI, which utilized simplified graph convolutional network (SGCN) model with low-dimensional and continuous features generated from our previously developed model SPVec and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three datasets and compared against four machine learning- and deep learning-based methods, as well as six state-of-the-art methods. Experimental results revealed that SPVec-SGCN-CPI outperformed all these competing methods, particularly excelling in unbalanced data scenarios. By propagating node features and topological information to the feature space, SPVec-SGCN-CPI effectively incorporates interactions between compounds and proteins, enabling the fusion of heterogeneity. Furthermore, our method scored all unlabeled data in ChEMBL, confirming the top five ranked compound-protein interactions through molecular docking and existing evidence. These findings suggest that our model can reliably uncover compound-protein interactions within unlabeled compound-protein pairs, carrying substantial implications for drug re-profiling and discovery. In summary, SPVec-SGCN demonstrates its efficacy in accurately predicting compound-protein interactions, showcasing potential to enhance target identification and streamline drug discovery processes. Scientific contributions The methodology presented in this work not only enables the comparatively accurate prediction of compound-protein interactions but also, for the first time, take sample imbalance which is very common in real world and computation efficiency into consideration simultaneously, accelerating the target identification and drug discovery process. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. The Impact of Input Types on Smart Contract Vulnerability Detection Performance Based on Deep Learning: A Preliminary Study.

Author: Aldyaflah, Izdehar M., Zhao, Wenbing, Yang, Shunkun, and Luo, Xiong
Subjects: *DEEP learning, *CONVOLUTIONAL neural networks, *CONTRACTS
Abstract: Stemming vulnerabilities out of a smart contract prior to its deployment is essential to ensure the security of decentralized applications. As such, numerous tools and machine-learning-based methods have been proposed to help detect vulnerabilities in smart contracts. Furthermore, various ways of encoding the smart contracts for analysis have also been proposed. However, the impact of these input methods has not been systematically studied, which is the primary goal of this paper. In this preliminary study, we experimented with four common types of input, including Word2Vec, FastText, Bag-of-Words (BoW), and Term Frequency–Inverse Document Frequency (TF-IDF). To focus on the comparison of these input types, we used the same deep-learning model, i.e., convolutional neural networks, in all experiments. Using a public dataset, we compared the vulnerability detection performance of the four input types both in the binary classification scenarios and the multiclass classification scenario. Our findings show that TF-IDF is the best overall input type among the four. TF-IDF has excellent detection performance in all scenarios: (1) it has the best F1 score and accuracy in binary classifications for all vulnerability types except for the delegate vulnerability where TF-IDF comes in a close second, and (2) it comes in a very close second behind BoW (within 0.8%) in the multiclass classification. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Measuring Corporate Human Capital Disclosures: Lexicon, Data, Code, and Research Opportunities.

Author: Demers, Elizabeth, Wang, Victor Xiaoqi, and Wu, Kean
Subjects: MACHINE learning, LANGUAGE models, HUMAN capital, LEXICON, VALUE creation, INDUSTRIAL relations
Abstract: Human capital (HC) is increasingly important to corporate value creation. Unlike other assets, however, HC is not currently subject to well-defined measurement or disclosure rules. We use a machine learning algorithm (word2vec) trained on a confirmed set of HC disclosures to develop a comprehensive list of HC-related keywords classified into five subcategories (DEI; health and safety; labor relations and culture; compensation and benefits; and demographics and other) that capture the multidimensional nature of HC management. We share our lexicon, corporate HC disclosures, and the Python code used to develop the lexicon, and we provide detailed examples of using our data and code, including for fine-tuning a BERT model. Researchers can use our HC lexicon (or modify the code to capture another construct of interest) with their samples of corporate communications to address pertinent HC questions. We close with a discussion of future research opportunities related to HC management and disclosure. Data Availability: Data are available from the public sources cited in the text. JEL Classifications: B40; C80; M14; M41; M54. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Ethics Incognito: Detecting Ethically Relevant Courses Across Curricula in Higher Education.

Author: Ongis, Martino, Kidd, David, and Miner, Jess
Subjects: COLLEGE curriculum, NATURAL language processing, ETHICS education, MORAL education, RESEARCH ethics, UNIVERSITY rankings
Abstract: As colleges and universities seek to invigorate ethics education, they need methods to identify where and describe how ethics is already present across their curricula. Meeting this need is complicated by the fact that much ethics education occurs in courses not explicitly focused on ethics or morality. In this paper, we review recent methodological advances before presenting a new Ethics Course Identification Tool (ECIT) that combines application of an expert-derived weighted dictionary and natural language processing methods to identify ethics-related courses based on their titles and course catalog descriptions, even when the terms "ethic" or "moral" are not present. Two studies, the second a pre-registered replication, revealed considerable interrater reliability among experts in ethics education regarding the ethical relevance of courses. Critically, both studies revealed strong correlations between expert judgments and ECIT scores. This empirical evidence points to a shared understanding of ethics education among experts, and it supports the valid use of the ECIT to rapidly and reliably identify ethics-related courses. Based on these findings, we propose that the ECIT can be used both to advance research on trends in ethics education and to help target interventions to improve ethics education at colleges and universities. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Genie: Enhancing information management in the restaurant industry through AI-powered chatbot

Author: Megha Gupta, Venkatasai Dheekonda, and Mohammad Masum
Subjects: Chatbot, Restaurant industry, Genie, NLP, ANN, Word2Vec, Information technology, T58.5-58.64
Abstract: In the dynamic restaurant industry, we introduce ''Genie,'' an AI-powered chatbot, represents an advancement in customer service efficiency through technological innovation. Designed to enhance restaurant operations including order processing, reservations, and FAQs management, Genie leverages advanced Natural Language Processing (NLP) techniques. By converting input queries into word embeddings and applying a sophisticated tag classification system, Genie precisely interprets customer intents and generates accurate responses, thereby markedly improving the dining experience. Our thorough examination of various word embeddings and classifiers—Word2Vec, Glove, BERT, Gaussian Naive Bayes, XGB, Artificial Neural Networks (ANN), and Recurrent Neural Networks—revealed that the combination of Word2Vec and ANN is the most effective, achieving an impressive accuracy rate of 88.9 %. This discovery highlights Genie's capability to not only streamline restaurant operations but also enhance customer satisfaction by minimizing wait times and facilitating contactless service options. Additionally, this study enriches the understanding of AI's application in service industries and explores the potential future impact of generative AI technologies on chatbot interactions. As AI technology advances, its integration is essential for Genie to deliver increasingly personalized and dynamic customer experiences, aligning with the evolving demands of the digital era. This research emphasizes the transformative impact of AI in the restaurant industry, providing valuable insights into its practical applications and future prospects for automated customer service solutions.
Published: 2024
Full Text: View/download PDF

39. Pelabelan Sentimen Berbasis Semi-Supervised Learning menggunakan Algoritma LSTM dan GRU

Author: Puji Ayuningtyas, Siti Khomsah, and Sudianto Sudianto
Subjects: Annotation, Deep Learning, GRU, LSTM, Semi-Supervised Learning, Word2Vec, Information technology, T58.5-58.64
Abstract: In the sentiment analysis research process, there are problems when still using manual labeling methods by humans (expert annotation), which are related to subjectivity, long time, and expensive costs. Another way is to use computer assistance (machine annotator). However, the use of machine annotators also has the research problem of not being able to detect sarcastic sentences. Thus, the researcher proposed a sentiment labeling method using Semi-Supervised Learning. Semi-supervised learning is a labeling method that combines human labeling techniques (expert annotation) and machine labeling (machine annotation). This research uses machine annotators in the form of Deep Learning algorithms, namely the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) algorithms. The word weighting method used in this research is Word2Vec Continuous Bag of Word (CBoW). The results showed that the GRU algorithm tends to have a better accuracy rate than the LSTM algorithm. The average accuracy of the training results of the LSTM and GRU algorithm models is 0.904 and 0.913. In contrast, the average accuracy of labeling by LSTM and GRU is 0.569 and 0.592, respectively.
Published: 2024
Full Text: View/download PDF

40. Automation Assemblages in the Internet of Things: Discovering Qualitative Practices at the Boundaries of Quantitative Change.

Author: Novak, Thomas P and Hoffman, Donna L
Subjects: CUSTOMER experience, INTERNET of things, AUTOMATION, NATURAL language processing, HUMAN-computer interaction, CONSUMER behavior
Abstract: We examine consumers' interactions with smart objects using a novel mixed-method approach, guided by assemblage theory, to discover the emergence of automation practices. We use a unique text data set from the web service IFTTT, ("If This Then That"), representing hundreds of thousands of applets that represent "if–then" connections between pairs of Internet services. Consumers use these applets to automate events in their daily lives. We quantitatively identify and qualitatively interpret automation assemblages that emerge bottom-up as different consumers create similar applets within unique social contexts. Our data discovery approach combines word embeddings, density-based clustering, and nonlinear dimensionality reduction with an inductive approach to the thematic analysis. We uncover 127 nested automation assemblages that correspond to automation practices. Practices are interpreted in terms of four higher-order categories: social expression, social connectedness, extended mind, and relational AI. To investigate the future trajectories of automation practices, we use the concept of the possibility space, a fundamental theoretical idea from assemblage theory. Using our empirical approach, we translate this theoretical possibility space of automation assemblages into a data visualization to predict how existing practices can grow and new practices can emerge. Our new approach makes conceptual, methodological, and empirical contributions with implications for consumer research and marketing strategy. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

41. DOES IT MATTER TO ACQUISITIONS? THE IMPACTS OF IT DISTANCE ON POST-ACQUISITION PERFORMANCE.

Author: Kyunghee Lee, Kunsoo Han, Animesh, Animesh, and Pinsonneault, Alain
Abstract: Although researchers have examined the role of dyadic dynamics (i.e., interactions between the acquirer and the target firm) in the success of acquisitions, little attention has been devoted to the role of information technology (IT). In this study, we extend this literature by examining how pre-acquisition IT distance (i.e., the difference between the enterprise IT systems of the two firms that reflects the system incompatibility and resulting costs of system integration) affects the acquirer’s post-acquisition performance. To measure IT distance, we used a word-embedding technique to map each firm’s IT systems portfolio to a low-dimensional embedding space and calculate the distance between the firms in that space. Using data on U.S. firms’ acquisition activities over seven years, we found that IT distance is negatively associated with the acquirer’s post-acquisition performance. Also, the adverse effect of IT distance is stronger for acquisitions motivated by operational synergies, compared to those seeking nonoperational synergies. This finding supports our fundamental premise that IT distance disrupts postacquisition synergy creation, and more so when the combined firm has a greater need for tight integration to create acquisition synergies. This research contributes to the merger and acquisition (M&A) literature in management and IS by introducing a novel concept of IT distance and by theorizing and empirically examining its performance implications in acquisitions. The findings of this study can inform practitioners on how to devise IT strategies in corporate acquisitions to mitigate IT risks and achieve greater postacquisition performance. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

42. Enhancing Automatic Speech Recognition in Air Traffic Communication Through Semantic Analysis and Error Correction

Author: Srinivasan, Narayanan, Balasundaram, S. R., Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Illés, Zoltán, editor, Verma, Chaman, editor, Gonçalves, Paulo J. Sequeira, editor, and Singh, Pradeep Kumar, editor
Published: 2024
Full Text: View/download PDF

43. Exploring Synonym Generation for Lexical Simplification: A Comparative Analysis of Static and Contextualized Word Embeddings

Author: RajyaLakshmi, Tamma, Kuppusamy, K. S., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Swaroop, Abhishek, editor, Kansal, Vineet, editor, Fortino, Giancarlo, editor, and Hassanien, Aboul Ella, editor
Published: 2024
Full Text: View/download PDF

44. Fundamentals of Vector-Based Text Representation and Word Embeddings

Author: Malik, Nidhi, Singh, Sanjeet, Biswas, Payal, Sharan, Aditi, Chakrabarti, Amlan, Series Editor, Becker, Jürgen, Editorial Board Member, Hu, Yu-Chen, Editorial Board Member, Chattopadhyay, Anupam, Editorial Board Member, Tribedi, Gaurav, Editorial Board Member, Saha, Sriparna, Editorial Board Member, Goswami, Saptarsi, Editorial Board Member, Sharan, Aditi, editor, Malik, Nidhi, editor, Imran, Hazra, editor, and Ghosh, Indira, editor
Published: 2024
Full Text: View/download PDF

45. Word Embedding-based Topic Modeling

Author: Bellaouar, Slimane, Itbirene, Ahmed, Chihani, Brahim, Luo, Xun, Editor-in-Chief, Almohammedi, Akram A., Series Editor, Chen, Chi-Hua, Series Editor, Guan, Steven, Series Editor, Pamucar, Dragan, Series Editor, Kerrache, Chaker Abdelaziz, editor, Tahari, Abdou El Karim, editor, Kassimi, Dounya, editor, and Chakraborty, Chinmay, editor
Published: 2024
Full Text: View/download PDF

46. Temporal Sentiment Analysis (TSMFPMSM) Model for Multimodal Social Media Fake Profile Detection

Author: Aditya, Bhrugumalla L. V. S., Mohanty, Sachi Nandan, Salini, Yalamanchili, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Castillo, Oscar, editor, Sudhakar Babu, Thanikanti, editor, and Aluvalu, Rajanikanth, editor
Published: 2024
Full Text: View/download PDF

47. Categorization of Arabic Medical Questions Using a Deep Learning Approach

Author: Bahbib, Mohammed, Tamym, Lahcen, Yakhlef, Majid Ben, Benyoucef, Lyes, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Daimi, Kevin, editor, and Al Sadoon, Abeer, editor
Published: 2024
Full Text: View/download PDF

48. Emotional Analysis Based on Text Using X MBTI Data

Author: Lee, Irene Songyeon, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
Published: 2024
Full Text: View/download PDF

49. Word2Vec-GloVe-BERT Embeddings for Query Expansion

Author: Gabsi, Imen, Kammoun, Hager, Mtar, Rawed, Amous, Ikram, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Abraham, Ajith, editor, Bajaj, Anu, editor, Hanne, Thomas, editor, and Hong, Tzung-Pei, editor
Published: 2024
Full Text: View/download PDF

50. Twitter Trolling Detection Using Machine Learning

Author: Ghosh, Shubhra Bhunia, Kumar, Horesh, Joshi, Aditya, Kumar, Anshul, Jain, Tarun, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Kumar, Rajesh, editor, Verma, Ajit Kumar, editor, Verma, Om Prakash, editor, and Wadehra, Tanu, editor
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

4,400 results on '"Word2Vec"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources