4,400 results on '"Word2Vec"'
Search Results
2. Sentiment Analysis of Amazon Alexa Product Reviews: A Comprehensive Comparative Study of Learning Algorithms
- Author
-
Rao, Gouravelli Akshith, Prakash, L. N. C. K., Suryanarayana, G., Joshua, Pathi Varun, Reddy, Katta Nithin Kumar, Karnati, Ramesh, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Kumar, Amit, editor, Gunjan, Vinit Kumar, editor, Senatore, Sabrina, editor, and Hu, Yu-Chen, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Curating Reagents in Chemical Reaction Data with an Interactive Reagent Space Map
- Author
-
Andronov, Mikhail, Andronova, Natalia, Wand, Michael, Schmidhuber, Jürgen, Clevert, Djork-Arné, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Clevert, Djork-Arné, editor, Wand, Michael, editor, Malinovská, Kristína, editor, Schmidhuber, Jürgen, editor, and Tetko, Igor V., editor
- Published
- 2025
- Full Text
- View/download PDF
4. Word Embeddings as Statistical Estimators.
- Author
-
Dey, Neil, Singer, Matthew, Williams, Jonathan P., and Sengupta, Srijan
- Abstract
Word embeddings are a fundamental tool in natural language processing. Currently, word embedding methods are evaluated on the basis of empirical performance on benchmark data sets, and there is a lack of rigorous understanding of their theoretical properties. This paper studies word embeddings from a statistical theoretical perspective, which is essential for formal inference and uncertainty quantification. We propose a copula-based statistical model for text data and show that under this model, the now-classical Word2Vec method can be interpreted as a statistical estimation method for estimating the theoretical pointwise mutual information (PMI). We further illustrate the utility of this statistical model by using it to develop a missing value-based estimator as a statistically tractable and interpretable alternative to the Word2Vec approach. The estimation error of this estimator is comparable to Word2Vec and improves upon the truncation-based method proposed by Levy and Goldberg (Adv. Neural Inf. Process. Syst., 27, 2177–2185 2014). The resulting estimator also is comparable to Word2Vec in a benchmark sentiment analysis task on the IMDb Movie Reviews data set and a part-of-speech tagging task on the OntoNotes data set. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Preservation of emotional context in tweet embeddings on social networking sites.
- Author
-
Maruyama, Osamu, Yoshinaga, Asato, and Sawai, Ken-ichi
- Abstract
In communication, emotional information is crucial, yet its preservation in tweet embeddings remains a challenge. This study aims to address this gap by exploring three distinct methods for generating embedding vectors of tweets: word2vec models, pre-trained BERT models, and fine-tuned BERT models. We conducted an analysis to assess the degree to which emotional information is conserved in the resulting embedding vectors. Our findings indicate that the fine-tuned BERT model exhibits a higher level of preservation of emotional information compared to other methods. These results underscore the importance of utilizing advanced natural language processing techniques for preserving emotional context in text data, with potential implications for enhancing sentiment analysis and understanding human communication in social media contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. RecommendRift: a leap forward in user experience with transfer learning on netflix recommendations.
- Author
-
Anuradha, Surabhi, Jyothi, Pothabathula Naga, Sivakumar, Surabhi, and Sheshikala, Martha
- Subjects
WORD frequency ,USER experience ,RECREATION ,RECOMMENDER systems - Abstract
In today's fast-paced lifestyle, streaming movies and series on platforms like Netflix is a valued recreational activity. However, users often spend considerable time searching for the right content and receive irrelevant recommendations, particularly when facing the "cold start problem" for new users. This challenge arises from existing recommender systems relying on factors like casting, title, and genre, using term frequency-inverse document frequency (TF-IDF) for vectorization, which prioritizes word frequency over semantic meaning. To address this, an innovative recommender system considering not only casting, title, and genre but also the short description of movies or shows is proposed in this study. Leveraging Word2Vec embedding for semantic relationships, this system offers recommendations aligning better with user preferences. Evaluation metrics including precision, mean average precision (MAP), discounted cumulative gain (DCG), and ideal cumulative gain (IDCG) demonstrate the system's effectiveness, achieving a normalized DCG (NDCG)@10 of 0.956. A/B testing shows an improved click-through rate (CTR) of recommendations, showcasing enhanced streaming experience. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines.
- Author
-
Hussain, Sadam, Naseem, Usman, Ali, Mansoor, Avendaño Avalos, Daly Betzabeth, Cardona-Huerta, Servando, Bosques Palomo, Beatriz Alejandra, and Tamez-Peña, Jose Gerardo
- Subjects
- *
LANGUAGE models , *GENERATIVE pre-trained transformers , *TRANSLATING & interpreting , *MACHINE learning , *SUPPORT vector machines , *DEEP learning , *NATURAL language processing - Abstract
Background: Recently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification. Results: The final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607). Conclusion: In this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Unveiling Similarities in the Code of Life: A Detailed Exploration of DNA Sequence Matching Algorithm.
- Author
-
Shams, Mahmoud Y., Farag, Romany M., Aldawody, Dalia A., Khalid, Huda E., Essa, Ahmed K., El-Bakry, Hazem M., and Salama, A. A.
- Subjects
- *
NUCLEOTIDE sequence , *ALGORITHMS , *BIOINFORMATICS , *DNA , *COSINE function - Abstract
Identifying similar DNA sequences is crucial in various biological research endeavors. This paper delves into the intricate workings of a specific algorithm designed for this purpose. We provide a systematic explanation, exploring how the algorithm handles user input, reads stored DNA sequences, utilizes the Word2Vec model for vector representation, and calculates sequence similarity using diverse metrics like Cosine Similarity and Neutrosophic Distance. Additionally, the paper explores the incorporation of neutrosophic values to account for uncertainty in the comparisons. Finally, we discuss the extraction of results, including matched sequences, similarity scores, and accuracy measures. This in-depth exploration provides a clear understanding of the algorithm's capabilities and fosters its effective application in DNA sequence analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. ONTO-TDM domain ontology population for a specific discipline.
- Author
-
Abdoune, Rosana, Lazib, Lydia, Dahmani-Bouarab, Farida, and Fernández-Breis, Jesualdo Tomás
- Abstract
Ontologies play a vital role in organizing and constructing knowledge across various domains, enabling effective knowledge management and sharing. The development of domain-specific ontologies, such as the ONTO-TDM ontology for teaching domain modeling, is essential for providing a comprehensive and standardized representation of knowledge within a given discipline. However, to maximize the usefulness and relevance of such ontologies, it is crucial to automate their population with domain-specific information, reducing manual work and ensuring scalability. This paper presents a novel method for ontology population by extracting and integrating relevant information from diverse sources. The method combines the TextRank algorithm with Word2Vec to enhance keyword extraction, capturing both semantic meaning and textual importance. Keywords are then annotated and used to train a machine learning classifier, which aids in integrating new instances into the ontology. Experiments show that the proposed method achieves a precision of 63.33%, a recall of 61.29% and an F1-score of 62.28%, significantly improving keyword extraction and ontology population accuracy compared to existing methods. This validates the method's effectiveness in semi-automatically extracting relevant instances from diverse data sources, enhancing the efficiency and accuracy of ontology population, and advancing automated knowledge management in domain-specific contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Enhancing emotion detection with synergistic combination of word embeddings and convolutional neural networks.
- Author
-
Jadon, Anil Kumar and Kumar, Suresh
- Subjects
CONVOLUTIONAL neural networks ,EMOTION recognition ,DEEP learning ,PSYCHIATRIC research ,CONSUMER research - Abstract
Recognizing emotions in textual data is crucial in a wide range of natural language processing (NLP) applications, from consumer sentiment research to mental health evaluation. The word embedding techniques play a pivotal role in text processing. In this paper, the performance of several well-known word embedding methods is evaluated in the context of emotion recognition. The classification of emotions is further enhanced using a convolutional neural network (CNN) model because of its propensity to capture local patterns and its recent triumphs in text-related tasks. The integration of CNN with word embedding techniques introduced an additional layer to the landscape of emotion detection from text. The synergy between word embedding techniques and CNN harnesses the strengths of both approaches. CNNs extract local patterns and features from sequential data, making them well-suited for capturing relevant information within the embeddings. The results obtained with various embeddings highlight the significance of choosing synergistic combinations for optimum performance. The combination of CNNs and word embeddings proved a versatile and effective approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines
- Author
-
Sadam Hussain, Usman Naseem, Mansoor Ali, Daly Betzabeth Avendaño Avalos, Servando Cardona-Huerta, Beatriz Alejandra Bosques Palomo, and Jose Gerardo Tamez-Peña
- Subjects
BI-RADS classification ,Breast radiological reports ,TF-IDF ,Word2vec ,NLP ,ML ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Background Recently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports’ classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification. Results The final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607). Conclusion In this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.
- Published
- 2024
- Full Text
- View/download PDF
12. Developing and testing the efficacy of a novel forecasting methodology: Theory and evidence from China.
- Author
-
Yang, Yuhong, Dogru, Tarik, Liang, Chao, Wang, Jianqiong, and Xu, Pengfei
- Subjects
FORECASTING methodology ,DEMAND forecasting ,PRINCIPAL components analysis ,PREDICTION theory ,TOURIST attractions - Abstract
Numerous methodologies have been offered to forecast tourism demand; however, accurate forecasting has been a major challenge for policymakers despite its critical importance for tourism planning. Therefore, we propose and test a novel forecasting methodology that combines principal component analysis (PCA) and long short-term memory (LSTM) network, along with the Baidu index, to forecast daily tourist arrivals for a popular tourist attraction in China. Word2Vec, a software tool launched by Google, is used to improve the coverage and accuracy of search keywords in the construction of the Baidu indexes. Before training the LSTM network, PCA is used to reduce noise and optimize the data. Considering the study's timeframe, the impact of COVID-19 pandemic has also been assessed. The efficacy of the proposed forecasting methodology is verified, and the results show that the PCA-LSTM model outperforms other models in terms of prediction accuracy and stability. Theoretical and practical implications are discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Analysis of the impact of the contextual embeddings usage on the text classification accuracy
- Author
-
Olesia Barkovska, Anton Havrashenko, Vitalii Serdechnyi, Vladyslav Kholiev, and Patrik Rusnak
- Subjects
classification ,nlp ,context ,model ,neural network ,word2vec ,glove ,embedding ,bert ,gpt ,Computer engineering. Computer hardware ,TK7885-7895 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Analysis of the impact of the contextual embeddings usage on the text classification accuracy
- Published
- 2024
- Full Text
- View/download PDF
14. Sentiment-semantic word vectors: A new method to estimate management sentiment
- Author
-
Tri Minh Phan
- Subjects
Knowledge distillation ,MD& A ,Stock return predictability ,Word2Vec ,Statistics ,HA1-4737 ,Economics as a science ,HB71-74 - Abstract
Abstract This paper introduces a novel method to extract the sentiment embedded in the Management’s Discussion and Analysis (MD &A) section of 10-K filings. The proposed method outperforms traditional approaches in terms of sentiment classification accuracy. Utilizing this method, the MD &A sentiment is found to be a strong negative predictor of future stock returns, demonstrating consistency in both in-sample and out-of-sample settings. By contrast, if traditional sentiment extraction methods are used, the MD &A sentiment exhibits no predictive ability for stock markets. Additionally, the MD &A sentiment is associated with dividend-related macroeconomic channels regarding future stock return prediction.
- Published
- 2024
- Full Text
- View/download PDF
15. Discrimination of semantically similar verbal memory traces is affected in healthy aging
- Author
-
Alex Ilyés, Borbála Paulik, and Attila Keresztes
- Subjects
Mnemonic discrimination ,Aging ,Semantic similarity ,Pattern separation ,word2vec ,Medicine ,Science - Abstract
Abstract Mnemonic discrimination of highly similar memory traces is affected in healthy aging via changes in hippocampal pattern separation—i.e., the ability of the hippocampus to orthogonalize highly similar neural inputs. The decline of this process leads to a loss of episodic specificity. Because previous studies have almost exclusively tested mnemonic discrimination of visuospatial stimuli (e.g., objects or scenes), less is known about age-related effects on the episodic specificity of semantically similar traces. To address this gap, we designed a task to assess mnemonic discrimination of verbal stimuli as a function of semantic similarity based on word embeddings. Forty young (Mage = 21.7 years) and 40 old adults (Mage = 69.8 years) first incidentally encoded adjective-noun phrases, then performed a surprise recognition test involving exactly repeated and highly similar lure phrases. We found that increasing semantic similarity negatively affected mnemonic discrimination in both age groups, and that compared to young adults, older adults showed worse discrimination at medium levels of semantic similarity. These results indicate that episodic specificity of semantically similar memory traces is affected in aging via less efficient mnemonic operations and strengthen the notion that mnemonic discrimination is a modality-independent process supporting memory specificity across representational domains.
- Published
- 2024
- Full Text
- View/download PDF
16. Optimizing word embeddings for small datasets: a case study on patient portal messages from breast cancer patients
- Author
-
Qingyuan Song, Congning Ni, Jeremy L. Warner, Qingxia Chen, Lijun Song, S. Trent Rosenbloom, Bradley A. Malin, and Zhijun Yin
- Subjects
Breast cancer ,Hormonal therapy ,Natural language processing ,Patient portal messages ,Word embedding models ,Word2vec ,Medicine ,Science - Abstract
Abstract Patient portal messages often relate to specific clinical phenomena (e.g., patients undergoing treatment for breast cancer) and, as a result, have received increasing attention in biomedical research. These messages require natural language processing and, while word embedding models, such as word2vec, have the potential to extract meaningful signals from text, they are not readily applicable to patient portal messages. This is because embedding models typically require millions of training samples to sufficiently represent semantics, while the volume of patient portal messages associated with a particular clinical phenomenon is often relatively small. We introduce a novel adaptation of the word2vec model, PK-word2vec (where PK stands for prior knowledge), for small-scale messages. PK-word2vec incorporates the most similar terms for medical words (including problems, treatments, and tests) and non-medical words from two pre-trained embedding models as prior knowledge to improve the training process. We applied PK-word2vec in a case study of patient portal messages in the Vanderbilt University Medical Center electric health record system sent by patients diagnosed with breast cancer from December 2004 to November 2017. We evaluated the model through a set of 1000 tasks, each of which compared the relevance of a given word to a group of the five most similar words generated by PK-word2vec and a group of the five most similar words generated by the standard word2vec model. We recruited 200 Amazon Mechanical Turk (AMT) workers and 7 medical students to perform the tasks. The dataset was composed of 1389 patient records and included 137,554 messages with 10,683 unique words. Prior knowledge was available for 7981 non-medical and 1116 medical words. In over 90% of the tasks, both reviewers indicated PK-word2vec generated more similar words than standard word2vec (p = 0.01).The difference in the evaluation by AMT workers versus medical students was negligible for all comparisons of tasks’ choices between the two groups of reviewers ( $${\text{p}} = 0.774$$ p = 0.774 under a paired t-test). PK-word2vec can effectively learn word representations from a small message corpus, marking a significant advancement in processing patient portal messages.
- Published
- 2024
- Full Text
- View/download PDF
17. Sentiment analysis of the Hamas-Israel war on YouTube comments using deep learning
- Author
-
Ashagrew Liyih, Shegaw Anagaw, Minichel Yibeyin, and Yitayal Tehone
- Subjects
Deep learning approach ,Recurrent neural network ,Sentiment analysis ,Word2vec ,FastText ,GloVe ,Medicine ,Science - Abstract
Abstract Sentiment analysis aims to classify text based on the opinion or mentality expressed in a situation, which can be positive, negative, or neutral. Therefore, in the world, a lot of opinions are available on various social media sites, which must be gathered and analyzed to assess the general public’s opinion. Finding and monitoring comments, as well as manually extracting the information contained in them, is a difficult task due to the vast diversity of ideas on YouTube. Identifying public opinion on war topics is crucial for offering insights to opposing sides based on popular opinion and emotions about the ongoing war. To address the gap, we build a model on YouTube comment sentiment analysis of the Hamas-Israel war to determine public opinion. In this study, we address the gaps by developing a deep learning-based approach for sentiment analysis. We have collected 24,360 comments from popular YouTube News Channels including BBC, WION, Aljazeera, and others about the Hamas-Israel War using YouTube API and Google spreadsheet and labeled them by linguistic experts into three classes: positive, negative, and neutral. Then, textual comments were preprocessed using natural language processing (NLP) techniques, and features were extracted using Word2vec, FastText, and GloVe. Moreover, we have used the SMOTE data balancing technique and used different data splits, but the 80/20 train-test split ratio has the highest accuracy. For classification model building, commonly used classification algorithms LSTM, Bi-LSTM, GRU, and Hybrid of CNN and Bi-LSTM were applied, and their performance is compared. As a result, the Hybrid of CNN and Bi-LSTM with Word2vec achieved the highest performance with 95.73% accuracy for comments classifications.
- Published
- 2024
- Full Text
- View/download PDF
18. Protection of Guizhou Miao batik culture based on knowledge graph and deep learning
- Author
-
Huafeng Quan, Yiting Li, Dashuai Liu, and Yue Zhou
- Subjects
Guizhou Miao batik ,Cultural heritage ,Knowledge graph ,ResNet34 ,Word2vec ,Fine Arts ,Analytical chemistry ,QD71-142 - Abstract
Abstract In the globalization trend, China’s cultural heritage is in danger of gradually disappearing. The protection and inheritance of these precious cultural resources has become a critical task. This paper focuses on the Miao batik culture in Guizhou Province, China, and explores the application of knowledge graphs, natural language processing, and deep learning techniques in the promotion and protection of batik culture. We propose a dual-channel mechanism that integrates semantic and visual information, aiming to connect batik pattern features with cultural connotations. First, we use natural language processing techniques to automatically extract batik-related entities and relationships from the literature, and construct and visualize a structured batik pattern knowledge graph. Based on this knowledge graph, users can textually search and understand the images, meanings, taboos, and other cultural information of specific patterns. Second, for the batik pattern classification, we propose an improved ResNet34 model. By embedding average pooling and convolutional operations into the residual blocks and introducing long-range residual connections, the classification performance is enhanced. By inputting pattern images into this model, their categories can be accurately identified, and then the underlying cultural connotations can be understood. Experimental results show that our model outperforms other mainstream models in evaluation metrics such as accuracy, precision, recall, and F1-score, achieving 94.46%, 94.47%, 93.62%, and 93.8%, respectively. This research provides new ideas for the digital protection of batik culture and demonstrates the great potential of artificial intelligence technology in cultural heritage protection.
- Published
- 2024
- Full Text
- View/download PDF
19. An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model
- Author
-
Yufang Zhang, Jiayi Li, Shenggeng Lin, Jianwei Zhao, Yi Xiong, and Dong-Qing Wei
- Subjects
Compound-protein interactions ,Graph convolutional network ,End-to-end learning ,word2vec ,Information technology ,T58.5-58.64 ,Chemistry ,QD1-999 - Abstract
Abstract Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope of candidates for traditional labor-intensive, time-consuming and expensive experimental techniques. In this study, we proposed an end-to-end approach termed SPVec-SGCN-CPI, which utilized simplified graph convolutional network (SGCN) model with low-dimensional and continuous features generated from our previously developed model SPVec and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three datasets and compared against four machine learning- and deep learning-based methods, as well as six state-of-the-art methods. Experimental results revealed that SPVec-SGCN-CPI outperformed all these competing methods, particularly excelling in unbalanced data scenarios. By propagating node features and topological information to the feature space, SPVec-SGCN-CPI effectively incorporates interactions between compounds and proteins, enabling the fusion of heterogeneity. Furthermore, our method scored all unlabeled data in ChEMBL, confirming the top five ranked compound-protein interactions through molecular docking and existing evidence. These findings suggest that our model can reliably uncover compound-protein interactions within unlabeled compound-protein pairs, carrying substantial implications for drug re-profiling and discovery. In summary, SPVec-SGCN demonstrates its efficacy in accurately predicting compound-protein interactions, showcasing potential to enhance target identification and streamline drug discovery processes. Scientific contributions The methodology presented in this work not only enables the comparatively accurate prediction of compound-protein interactions but also, for the first time, take sample imbalance which is very common in real world and computation efficiency into consideration simultaneously, accelerating the target identification and drug discovery process.
- Published
- 2024
- Full Text
- View/download PDF
20. Sentiment-semantic word vectors: A new method to estimate management sentiment.
- Author
-
Phan, Tri Minh
- Subjects
RATE of return on stocks ,NEW words ,STOCKS (Finance) ,CLASSIFICATION ,FORECASTING - Abstract
This paper introduces a novel method to extract the sentiment embedded in the Management's Discussion and Analysis (MD &A) section of 10-K filings. The proposed method outperforms traditional approaches in terms of sentiment classification accuracy. Utilizing this method, the MD &A sentiment is found to be a strong negative predictor of future stock returns, demonstrating consistency in both in-sample and out-of-sample settings. By contrast, if traditional sentiment extraction methods are used, the MD &A sentiment exhibits no predictive ability for stock markets. Additionally, the MD &A sentiment is associated with dividend-related macroeconomic channels regarding future stock return prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Discrimination of semantically similar verbal memory traces is affected in healthy aging.
- Author
-
Ilyés, Alex, Paulik, Borbála, and Keresztes, Attila
- Subjects
- *
VERBAL memory , *OLDER people , *VERBAL learning , *AGE discrimination , *AGE groups , *YOUNG adults - Abstract
Mnemonic discrimination of highly similar memory traces is affected in healthy aging via changes in hippocampal pattern separation—i.e., the ability of the hippocampus to orthogonalize highly similar neural inputs. The decline of this process leads to a loss of episodic specificity. Because previous studies have almost exclusively tested mnemonic discrimination of visuospatial stimuli (e.g., objects or scenes), less is known about age-related effects on the episodic specificity of semantically similar traces. To address this gap, we designed a task to assess mnemonic discrimination of verbal stimuli as a function of semantic similarity based on word embeddings. Forty young (Mage = 21.7 years) and 40 old adults (Mage = 69.8 years) first incidentally encoded adjective-noun phrases, then performed a surprise recognition test involving exactly repeated and highly similar lure phrases. We found that increasing semantic similarity negatively affected mnemonic discrimination in both age groups, and that compared to young adults, older adults showed worse discrimination at medium levels of semantic similarity. These results indicate that episodic specificity of semantically similar memory traces is affected in aging via less efficient mnemonic operations and strengthen the notion that mnemonic discrimination is a modality-independent process supporting memory specificity across representational domains. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Construction and Validation of a Computerized Creativity Assessment Tool With Automated Scoring Based on Deep-Learning Techniques.
- Author
-
Sung, Yao-Ting, Cheng, Hao-Hsin, Tseng, Hou-Chiang, Chang, Kuo-En, and Lin, Shu-Yen
- Subjects
- *
DIVERGENT thinking , *CREATIVE ability , *STATISTICAL reliability , *CHINESE language , *TEST validity - Abstract
Based on the divergent thinking (DT) framework of creativity assessment, this study constructed the Computerized Creativity Assessment with Figure Test (C-CRAFT) that is equipped with an automated scoring system and built around a deep-learning-based semantic space model called Word2Vec. A subject pool of 493 undergraduates completed the C-CRAFT as well as a conventional paper-and-pencil DT test that required manual scoring. We found moderately high to high coefficients for the correlations between the two tests, which suggested that the C-CRAFT has strong criterion-related validity. The results of the pre- and posttests also demonstrated the high test–retest reliability of the C-CRAFT. Good discriminant validity was evidenced by highly significant differences in the C-CRAFT scores between college students from art and design-related fields and students from other majors. These research findings indicate that the C-CRAFT is a valid and reliable assessment tool for DT, while the automated nature of the C-CRAFT makes it easier to implement the DT test compared with traditional approaches. Moreover, by applying the C-CRAFT to the Chinese language, this study contributes to the cross-linguistic research of semantic models in creativity assessment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. An innovative framework for supporting content-based authorship identification and analysis in social media networks.
- Author
-
Puerta, José Gaviria de la, Pastor-López, Iker, Tellaeche, Alberto, Sanz, Borja, Sanjurjo-González, Hugo, Cuzzocrea, Alfredo, and Bringas, Pablo G
- Subjects
SOCIAL media ,SOCIAL network analysis ,ONLINE social networks ,AUTHORSHIP ,SCIENTIFIC community ,SOCIAL networks ,USER-generated content - Abstract
Content-based authorship identification is an emerging research problem in online social media networks, due to a wide collection of issues ranging from security to privacy preservation, from radicalization to defamation detection, and so forth. Indeed, this research has attracted a relevant amount of attention from the research community during the past years. The general problem becomes harder when we consider the additional constraint of identifying the same false profile over different social media networks, under obvious considerations. Inspired by this emerging research challenge, in this paper we propose and experimentally assess an innovative framework for supporting content-based authorship identification and analysis in social media networks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. A proposed semantic keywords search engine for Indonesian Qur'an translation based on word embedding.
- Author
-
Trisnawati, Liza, Binti Samsudin, Noor Azah, Bin Ahmad Khalid, Shamsul Kamal, Bin Ahmad Shaubari, Ezak Fadzrin, Sukri, and Indra, Zul
- Subjects
KEYWORD searching ,MACHINE translating ,SEARCH engines ,TRANSLATING & interpreting - Abstract
Obtaining relevant information from the Holy Qur'an can be really challenging for people who cannot speak Arabic, such as the Indonesian people. One technology implementation which is commonly used to tackle this problem is to develop a search engine application for Al-Qur'an verses. This paper proposes a search engine based on semantic representation keywords for the Indonesian translation of the Al-Qur'an which consists of 3 phases i.e., data preparation, document representation, and search engine development. In the first stage, the Al-Qur'an dataset was built using the official translation of the Al-Qur'an from the Ministry of Religion and then enriched with the Wikipedia corpus. The second phase is document representation which produces feature vectors by utilizing the Word2Vec algorithm. Finally, the development of a search engine that can find the most relevant verses by calculating the cosine similarity between the document and the keywords. It was found that the proposed search engine succeeded in exceeding the performance of ordinary search engines by finding wider information due to the use of semantic keywords. Apart from that, the proposed search engine succeeded in maintaining the relevance of search results by achieving precision and recall levels of 98.7% and 97.3% respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. The evolution of China's English education policy and challenges in higher education: analysis based on LDA and Word2Vec.
- Author
-
Haiyang Hu, Fan Li, and Zhengying Luo
- Subjects
EDUCATION policy ,ENGLISH language ,ONLINE education ,BELT & Road Initiative ,CREATIVE ability ,HIGHER education ,LANGUAGE ability - Abstract
China proposed the Belt and Road Initiative to strengthen regional connectivity so as to embrace a brighter future together. Since the Initiative was put forward, it has brought many challenges to China's English education policy. By employing Latent Dirichlet Allocation (LDA) and Word2Vec, this study analyzes the evolution of topics and challenges in China's English education policy under the Belt and Road Initiative. The results indicate that after the initiative, the policy focus has changed. English education has shifted from testing abilities to cultivating students' intercultural communication skills in order to meet the needs with countries alongside the "Belt and Road". Moreover, teaching strategies that were examination-oriented have also changed to emphasizing teaching methods and feedback. The focus and teaching strategies have also undergone great changes. China's English education policy has shifted from focusing on improving students' writing skills, English proficiency, and creativity to conducting in-depth research and addressing specific issues, including challenges in linguistics, media influence, educational institutions and programs, online courses, attitudes and self-efficacy, use of multiple languages and globalization, teaching issues, and curriculum design. These findings shed light on how the Belt and Road Initiative changed China's English education policy and provide further directions for future research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding.
- Author
-
Jain, Minni, Jindal, Rajni, and Jain, Amita
- Subjects
- *
NATURAL language processing , *SOCIAL media , *FUZZY graphs , *SENTIMENT analysis , *SPELLING errors - Abstract
Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Effectiveness of ELMo embeddings, and semantic models in predicting review helpfulness.
- Author
-
Malik, Muhammad Shahid Iqbal, Nawaz, Aftab, Jamjoom, Mona Mamdouh, and Ignatov, Dmitry I.
- Subjects
- *
LANGUAGE models , *PRODUCT reviews , *CONSUMERS' reviews , *ONLINE shopping , *FEATURE selection , *VIDEO games - Abstract
Online product reviews (OPR) are a commonly used medium for consumers to communicate their experiences with products during online shopping. Previous studies have investigated the helpfulness of OPRs using frequency-based, linguistic, meta-data, readability, and reviewer attributes. In this study, we explored the impact of robust contextual word embeddings, topic, and language models in predicting the helpfulness of OPRs. In addition, the wrapper-based feature selection technique is employed to select effective subsets from each type of features. Five feature generation techniques including word2vec, FastText, Global Vectors for Word Representation (GloVe), Latent Dirichlet Allocation (LDA), and Embeddings from Language Models (ELMo), were employed. The proposed framework is evaluated on two Amazon datasets (Video games and Health & personal care). The results showed that the ELMo model outperformed the six standard baselines, including the fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model. In addition, ELMo achieved Mean Square Error (MSE) of 0.0887 and 0.0786 respectively on two datasets and MSE of 0.0791 and 0.0708 with the wrapper method. This results in the reduction of 1.43% and 1.63% in MSE as compared to the fine-tuned BERT model on respective datasets. However, the LDA model has a comparable performance with the fine-tuned BERT model but outperforms the other five baselines. The proposed framework demonstrated good generalization abilities by uncovering important factors of product reviews and can be evaluated on other voting platforms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Malware Classification Using Dynamically Extracted API Call Embeddings.
- Author
-
Aggarwal, Sahil and Di Troia, Fabio
- Subjects
HIDDEN Markov models ,COMPUTER security ,MALWARE ,CONVOLUTIONAL neural networks ,SUPPORT vector machines ,NATURAL language processing - Abstract
Malware classification stands as a crucial element in establishing robust computer security protocols, encompassing the segmentation of malware into discrete groupings. Recently, the emergence of machine learning has presented itself as an apt approach for addressing this challenge. Models can undergo training employing diverse malware attributes, such as opcodes and API calls, to distill valuable insights for effective classification. Within the realm of natural language processing, word embeddings assume a pivotal role by representing text in a manner that aligns closely with the proximity of similar words. These embeddings facilitate the quantification of word resemblances. This research embarks on a series of experiments that harness hybrid machine learning methodologies. We derive word vectors from dynamic API call logs associated with malware and integrate them as features in collaboration with diverse classifiers. Our methodology involves the utilization of Hidden Markov Models and Word2Vec to generate embeddings from API call logs. Additionally, we amalgamate renowned models like BERT and ELMo, noted for their capacity to yield contextualized embeddings. The resultant vectors are channeled into our classifiers, namely Support Vector Machines (SVMs), Random Forest (RF), k-Nearest Neighbors (kNNs), and Convolutional Neural Networks (CNNs). Through two distinct sets of experiments, our objective revolves around the classification of both malware families and categories. The outcomes achieved illuminate the efficacy of API call embeddings as a potent instrument in the domain of malware classification, particularly in the realm of identifying malware families. The best combination was RF and word embeddings generated by Word2Vec, ELMo, and BERT, achieving an accuracy between 0.91 and 0.93. This result underscores the potential of our approach in effectively classifying malware. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Using word embedding to detect keywords in texts modeled as complex networks.
- Author
-
Tohalino, Jorge A. V., Silva, Thiago C., and Amancio, Diego R.
- Abstract
Detecting keywords in texts is a task of paramount importance for many text mining applications. Graph-based techniques have been commonly used to automatically find the key concepts in texts. However, the integration of valuable information provided by embeddings to enrich the graph structure has not been widely used. In this context, this paper aims to address the following question: can the quality of extracted keywords from a co-occurrence network be enhanced by integrating embeddings to enrich the network structure? In the adopted model, texts are represented as co-occurrence networks, where nodes are words and edges are established either by contextual or semantical similarity. Two embedding approaches were used: Word2vec and Bidirectional Encoder Representations from Transformers (BERT). The results indicate that using virtual edges can effectively enhance the discriminative capacity of co-occurrence networks. The best performance was achieved by incorporating a limited proportion of virtual (embedding) edges. A comparison of the structural and dynamical network metrics demonstrated that the degree, PageRank, and accessibility metrics exhibited superior performance in the proposed model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. 基于 Word2Vec 和 LDA 主题模型的 中国省级五年规划 “文化政策” 文本研究.
- Author
-
高娜 and 东梅
- Abstract
Copyright of Cyber Security & Data Governance is the property of Editorial Office of Information Technology & Network Security and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
31. Cloud-based machine learning algorithms for anomalies detection.
- Author
-
Amarnath, Raveendra N. and Gurulakshmanan, Gurumoorthi
- Subjects
MACHINE learning ,NATURAL language processing ,RECURRENT neural networks ,BAYESIAN analysis ,COMPUTER passwords ,MACHINE translating ,RECOMMENDER systems - Abstract
Gradient boosting machines harnesses the inherent capabilities of decision trees and meticulously corrects their errors in a sequential fashion, culminating in remarkably precise predictions. Word2Vec, a prominent word embedding technique, occupies a pivotal role in natural language processing (NLP) tasks. Its proficiency lies in capturing intricate semantic relationships among words, thereby facilitating applications such as sentiment analysis, document classification, and machine translation to discern subtle nuances present in textual data. Bayesian networks introduce probabilistic modeling capabilities, predominantly in contexts marked by uncertainty. Their versatile applications encompass risk assessment, fault diagnosis, and recommendation systems. Gated recurrent units (GRU), a variant of recurrent neural networks, emerges as a formidable asset in modeling sequential data. Both training and testing are crucial to the success of an intrusion detection system (IDS). During the training phase, several models are created, each of which can recognize typical from anomalous patterns within a given dataset. To acquire passwords and credit card details, "phishing" usually entails impersonating a trusted company. Predictions of student performance on academic tasks are improved by hyper parameter optimization of the gradient boosting regression tree using the grid search approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Protection of Guizhou Miao batik culture based on knowledge graph and deep learning.
- Author
-
Quan, Huafeng, Li, Yiting, Liu, Dashuai, and Zhou, Yue
- Subjects
- *
KNOWLEDGE graphs , *DEEP learning , *NATURAL language processing , *TABOO , *BATIK , *ARTIFICIAL intelligence , *AUTOMATIC classification , *CULTURAL intelligence - Abstract
In the globalization trend, China's cultural heritage is in danger of gradually disappearing. The protection and inheritance of these precious cultural resources has become a critical task. This paper focuses on the Miao batik culture in Guizhou Province, China, and explores the application of knowledge graphs, natural language processing, and deep learning techniques in the promotion and protection of batik culture. We propose a dual-channel mechanism that integrates semantic and visual information, aiming to connect batik pattern features with cultural connotations. First, we use natural language processing techniques to automatically extract batik-related entities and relationships from the literature, and construct and visualize a structured batik pattern knowledge graph. Based on this knowledge graph, users can textually search and understand the images, meanings, taboos, and other cultural information of specific patterns. Second, for the batik pattern classification, we propose an improved ResNet34 model. By embedding average pooling and convolutional operations into the residual blocks and introducing long-range residual connections, the classification performance is enhanced. By inputting pattern images into this model, their categories can be accurately identified, and then the underlying cultural connotations can be understood. Experimental results show that our model outperforms other mainstream models in evaluation metrics such as accuracy, precision, recall, and F1-score, achieving 94.46%, 94.47%, 93.62%, and 93.8%, respectively. This research provides new ideas for the digital protection of batik culture and demonstrates the great potential of artificial intelligence technology in cultural heritage protection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Sentiment analysis of the Hamas-Israel war on YouTube comments using deep learning.
- Author
-
Liyih, Ashagrew, Anagaw, Shegaw, Yibeyin, Minichel, and Tehone, Yitayal
- Abstract
Sentiment analysis aims to classify text based on the opinion or mentality expressed in a situation, which can be positive, negative, or neutral. Therefore, in the world, a lot of opinions are available on various social media sites, which must be gathered and analyzed to assess the general public’s opinion. Finding and monitoring comments, as well as manually extracting the information contained in them, is a difficult task due to the vast diversity of ideas on YouTube. Identifying public opinion on war topics is crucial for offering insights to opposing sides based on popular opinion and emotions about the ongoing war. To address the gap, we build a model on YouTube comment sentiment analysis of the Hamas-Israel war to determine public opinion. In this study, we address the gaps by developing a deep learning-based approach for sentiment analysis. We have collected 24,360 comments from popular YouTube News Channels including BBC, WION, Aljazeera, and others about the Hamas-Israel War using YouTube API and Google spreadsheet and labeled them by linguistic experts into three classes: positive, negative, and neutral. Then, textual comments were preprocessed using natural language processing (NLP) techniques, and features were extracted using Word2vec, FastText, and GloVe. Moreover, we have used the SMOTE data balancing technique and used different data splits, but the 80/20 train-test split ratio has the highest accuracy. For classification model building, commonly used classification algorithms LSTM, Bi-LSTM, GRU, and Hybrid of CNN and Bi-LSTM were applied, and their performance is compared. As a result, the Hybrid of CNN and Bi-LSTM with Word2vec achieved the highest performance with 95.73% accuracy for comments classifications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model.
- Author
-
Zhang, Yufang, Li, Jiayi, Lin, Shenggeng, Zhao, Jianwei, Xiong, Yi, and Wei, Dong-Qing
- Subjects
- *
LANGUAGE models , *DRUG discovery , *DEEP learning , *SCIENTIFIC method , *END-to-end delay , *MOLECULAR docking , *FORECASTING - Abstract
Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope of candidates for traditional labor-intensive, time-consuming and expensive experimental techniques. In this study, we proposed an end-to-end approach termed SPVec-SGCN-CPI, which utilized simplified graph convolutional network (SGCN) model with low-dimensional and continuous features generated from our previously developed model SPVec and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three datasets and compared against four machine learning- and deep learning-based methods, as well as six state-of-the-art methods. Experimental results revealed that SPVec-SGCN-CPI outperformed all these competing methods, particularly excelling in unbalanced data scenarios. By propagating node features and topological information to the feature space, SPVec-SGCN-CPI effectively incorporates interactions between compounds and proteins, enabling the fusion of heterogeneity. Furthermore, our method scored all unlabeled data in ChEMBL, confirming the top five ranked compound-protein interactions through molecular docking and existing evidence. These findings suggest that our model can reliably uncover compound-protein interactions within unlabeled compound-protein pairs, carrying substantial implications for drug re-profiling and discovery. In summary, SPVec-SGCN demonstrates its efficacy in accurately predicting compound-protein interactions, showcasing potential to enhance target identification and streamline drug discovery processes. Scientific contributions The methodology presented in this work not only enables the comparatively accurate prediction of compound-protein interactions but also, for the first time, take sample imbalance which is very common in real world and computation efficiency into consideration simultaneously, accelerating the target identification and drug discovery process. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. The Impact of Input Types on Smart Contract Vulnerability Detection Performance Based on Deep Learning: A Preliminary Study.
- Author
-
Aldyaflah, Izdehar M., Zhao, Wenbing, Yang, Shunkun, and Luo, Xiong
- Subjects
- *
DEEP learning , *CONVOLUTIONAL neural networks , *CONTRACTS - Abstract
Stemming vulnerabilities out of a smart contract prior to its deployment is essential to ensure the security of decentralized applications. As such, numerous tools and machine-learning-based methods have been proposed to help detect vulnerabilities in smart contracts. Furthermore, various ways of encoding the smart contracts for analysis have also been proposed. However, the impact of these input methods has not been systematically studied, which is the primary goal of this paper. In this preliminary study, we experimented with four common types of input, including Word2Vec, FastText, Bag-of-Words (BoW), and Term Frequency–Inverse Document Frequency (TF-IDF). To focus on the comparison of these input types, we used the same deep-learning model, i.e., convolutional neural networks, in all experiments. Using a public dataset, we compared the vulnerability detection performance of the four input types both in the binary classification scenarios and the multiclass classification scenario. Our findings show that TF-IDF is the best overall input type among the four. TF-IDF has excellent detection performance in all scenarios: (1) it has the best F1 score and accuracy in binary classifications for all vulnerability types except for the delegate vulnerability where TF-IDF comes in a close second, and (2) it comes in a very close second behind BoW (within 0.8%) in the multiclass classification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Measuring Corporate Human Capital Disclosures: Lexicon, Data, Code, and Research Opportunities.
- Author
-
Demers, Elizabeth, Wang, Victor Xiaoqi, and Wu, Kean
- Subjects
MACHINE learning ,LANGUAGE models ,HUMAN capital ,LEXICON ,VALUE creation ,INDUSTRIAL relations - Abstract
Human capital (HC) is increasingly important to corporate value creation. Unlike other assets, however, HC is not currently subject to well-defined measurement or disclosure rules. We use a machine learning algorithm (word2vec) trained on a confirmed set of HC disclosures to develop a comprehensive list of HC-related keywords classified into five subcategories (DEI; health and safety; labor relations and culture; compensation and benefits; and demographics and other) that capture the multidimensional nature of HC management. We share our lexicon, corporate HC disclosures, and the Python code used to develop the lexicon, and we provide detailed examples of using our data and code, including for fine-tuning a BERT model. Researchers can use our HC lexicon (or modify the code to capture another construct of interest) with their samples of corporate communications to address pertinent HC questions. We close with a discussion of future research opportunities related to HC management and disclosure. Data Availability: Data are available from the public sources cited in the text. JEL Classifications: B40; C80; M14; M41; M54. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Ethics Incognito: Detecting Ethically Relevant Courses Across Curricula in Higher Education.
- Author
-
Ongis, Martino, Kidd, David, and Miner, Jess
- Subjects
COLLEGE curriculum ,NATURAL language processing ,ETHICS education ,MORAL education ,RESEARCH ethics ,UNIVERSITY rankings - Abstract
As colleges and universities seek to invigorate ethics education, they need methods to identify where and describe how ethics is already present across their curricula. Meeting this need is complicated by the fact that much ethics education occurs in courses not explicitly focused on ethics or morality. In this paper, we review recent methodological advances before presenting a new Ethics Course Identification Tool (ECIT) that combines application of an expert-derived weighted dictionary and natural language processing methods to identify ethics-related courses based on their titles and course catalog descriptions, even when the terms "ethic" or "moral" are not present. Two studies, the second a pre-registered replication, revealed considerable interrater reliability among experts in ethics education regarding the ethical relevance of courses. Critically, both studies revealed strong correlations between expert judgments and ECIT scores. This empirical evidence points to a shared understanding of ethics education among experts, and it supports the valid use of the ECIT to rapidly and reliably identify ethics-related courses. Based on these findings, we propose that the ECIT can be used both to advance research on trends in ethics education and to help target interventions to improve ethics education at colleges and universities. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Genie: Enhancing information management in the restaurant industry through AI-powered chatbot
- Author
-
Megha Gupta, Venkatasai Dheekonda, and Mohammad Masum
- Subjects
Chatbot ,Restaurant industry ,Genie ,NLP ,ANN ,Word2Vec ,Information technology ,T58.5-58.64 - Abstract
In the dynamic restaurant industry, we introduce ''Genie,'' an AI-powered chatbot, represents an advancement in customer service efficiency through technological innovation. Designed to enhance restaurant operations including order processing, reservations, and FAQs management, Genie leverages advanced Natural Language Processing (NLP) techniques. By converting input queries into word embeddings and applying a sophisticated tag classification system, Genie precisely interprets customer intents and generates accurate responses, thereby markedly improving the dining experience. Our thorough examination of various word embeddings and classifiers—Word2Vec, Glove, BERT, Gaussian Naive Bayes, XGB, Artificial Neural Networks (ANN), and Recurrent Neural Networks—revealed that the combination of Word2Vec and ANN is the most effective, achieving an impressive accuracy rate of 88.9 %. This discovery highlights Genie's capability to not only streamline restaurant operations but also enhance customer satisfaction by minimizing wait times and facilitating contactless service options. Additionally, this study enriches the understanding of AI's application in service industries and explores the potential future impact of generative AI technologies on chatbot interactions. As AI technology advances, its integration is essential for Genie to deliver increasingly personalized and dynamic customer experiences, aligning with the evolving demands of the digital era. This research emphasizes the transformative impact of AI in the restaurant industry, providing valuable insights into its practical applications and future prospects for automated customer service solutions.
- Published
- 2024
- Full Text
- View/download PDF
39. Pelabelan Sentimen Berbasis Semi-Supervised Learning menggunakan Algoritma LSTM dan GRU
- Author
-
Puji Ayuningtyas, Siti Khomsah, and Sudianto Sudianto
- Subjects
Annotation ,Deep Learning ,GRU ,LSTM ,Semi-Supervised Learning ,Word2Vec ,Information technology ,T58.5-58.64 - Abstract
In the sentiment analysis research process, there are problems when still using manual labeling methods by humans (expert annotation), which are related to subjectivity, long time, and expensive costs. Another way is to use computer assistance (machine annotator). However, the use of machine annotators also has the research problem of not being able to detect sarcastic sentences. Thus, the researcher proposed a sentiment labeling method using Semi-Supervised Learning. Semi-supervised learning is a labeling method that combines human labeling techniques (expert annotation) and machine labeling (machine annotation). This research uses machine annotators in the form of Deep Learning algorithms, namely the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) algorithms. The word weighting method used in this research is Word2Vec Continuous Bag of Word (CBoW). The results showed that the GRU algorithm tends to have a better accuracy rate than the LSTM algorithm. The average accuracy of the training results of the LSTM and GRU algorithm models is 0.904 and 0.913. In contrast, the average accuracy of labeling by LSTM and GRU is 0.569 and 0.592, respectively.
- Published
- 2024
- Full Text
- View/download PDF
40. Automation Assemblages in the Internet of Things: Discovering Qualitative Practices at the Boundaries of Quantitative Change.
- Author
-
Novak, Thomas P and Hoffman, Donna L
- Subjects
CUSTOMER experience ,INTERNET of things ,AUTOMATION ,NATURAL language processing ,HUMAN-computer interaction ,CONSUMER behavior - Abstract
We examine consumers' interactions with smart objects using a novel mixed-method approach, guided by assemblage theory, to discover the emergence of automation practices. We use a unique text data set from the web service IFTTT, ("If This Then That"), representing hundreds of thousands of applets that represent "if–then" connections between pairs of Internet services. Consumers use these applets to automate events in their daily lives. We quantitatively identify and qualitatively interpret automation assemblages that emerge bottom-up as different consumers create similar applets within unique social contexts. Our data discovery approach combines word embeddings, density-based clustering, and nonlinear dimensionality reduction with an inductive approach to the thematic analysis. We uncover 127 nested automation assemblages that correspond to automation practices. Practices are interpreted in terms of four higher-order categories: social expression, social connectedness, extended mind, and relational AI. To investigate the future trajectories of automation practices, we use the concept of the possibility space, a fundamental theoretical idea from assemblage theory. Using our empirical approach, we translate this theoretical possibility space of automation assemblages into a data visualization to predict how existing practices can grow and new practices can emerge. Our new approach makes conceptual, methodological, and empirical contributions with implications for consumer research and marketing strategy. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
41. DOES IT MATTER TO ACQUISITIONS? THE IMPACTS OF IT DISTANCE ON POST-ACQUISITION PERFORMANCE.
- Author
-
Kyunghee Lee, Kunsoo Han, Animesh, Animesh, and Pinsonneault, Alain
- Abstract
Although researchers have examined the role of dyadic dynamics (i.e., interactions between the acquirer and the target firm) in the success of acquisitions, little attention has been devoted to the role of information technology (IT). In this study, we extend this literature by examining how pre-acquisition IT distance (i.e., the difference between the enterprise IT systems of the two firms that reflects the system incompatibility and resulting costs of system integration) affects the acquirer’s post-acquisition performance. To measure IT distance, we used a word-embedding technique to map each firm’s IT systems portfolio to a low-dimensional embedding space and calculate the distance between the firms in that space. Using data on U.S. firms’ acquisition activities over seven years, we found that IT distance is negatively associated with the acquirer’s post-acquisition performance. Also, the adverse effect of IT distance is stronger for acquisitions motivated by operational synergies, compared to those seeking nonoperational synergies. This finding supports our fundamental premise that IT distance disrupts postacquisition synergy creation, and more so when the combined firm has a greater need for tight integration to create acquisition synergies. This research contributes to the merger and acquisition (M&A) literature in management and IS by introducing a novel concept of IT distance and by theorizing and empirically examining its performance implications in acquisitions. The findings of this study can inform practitioners on how to devise IT strategies in corporate acquisitions to mitigate IT risks and achieve greater postacquisition performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
42. Enhancing Automatic Speech Recognition in Air Traffic Communication Through Semantic Analysis and Error Correction
- Author
-
Srinivasan, Narayanan, Balasundaram, S. R., Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Illés, Zoltán, editor, Verma, Chaman, editor, Gonçalves, Paulo J. Sequeira, editor, and Singh, Pradeep Kumar, editor
- Published
- 2024
- Full Text
- View/download PDF
43. Exploring Synonym Generation for Lexical Simplification: A Comparative Analysis of Static and Contextualized Word Embeddings
- Author
-
RajyaLakshmi, Tamma, Kuppusamy, K. S., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Swaroop, Abhishek, editor, Kansal, Vineet, editor, Fortino, Giancarlo, editor, and Hassanien, Aboul Ella, editor
- Published
- 2024
- Full Text
- View/download PDF
44. Fundamentals of Vector-Based Text Representation and Word Embeddings
- Author
-
Malik, Nidhi, Singh, Sanjeet, Biswas, Payal, Sharan, Aditi, Chakrabarti, Amlan, Series Editor, Becker, Jürgen, Editorial Board Member, Hu, Yu-Chen, Editorial Board Member, Chattopadhyay, Anupam, Editorial Board Member, Tribedi, Gaurav, Editorial Board Member, Saha, Sriparna, Editorial Board Member, Goswami, Saptarsi, Editorial Board Member, Sharan, Aditi, editor, Malik, Nidhi, editor, Imran, Hazra, editor, and Ghosh, Indira, editor
- Published
- 2024
- Full Text
- View/download PDF
45. Word Embedding-based Topic Modeling
- Author
-
Bellaouar, Slimane, Itbirene, Ahmed, Chihani, Brahim, Luo, Xun, Editor-in-Chief, Almohammedi, Akram A., Series Editor, Chen, Chi-Hua, Series Editor, Guan, Steven, Series Editor, Pamucar, Dragan, Series Editor, Kerrache, Chaker Abdelaziz, editor, Tahari, Abdou El Karim, editor, Kassimi, Dounya, editor, and Chakraborty, Chinmay, editor
- Published
- 2024
- Full Text
- View/download PDF
46. Temporal Sentiment Analysis (TSMFPMSM) Model for Multimodal Social Media Fake Profile Detection
- Author
-
Aditya, Bhrugumalla L. V. S., Mohanty, Sachi Nandan, Salini, Yalamanchili, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Castillo, Oscar, editor, Sudhakar Babu, Thanikanti, editor, and Aluvalu, Rajanikanth, editor
- Published
- 2024
- Full Text
- View/download PDF
47. Categorization of Arabic Medical Questions Using a Deep Learning Approach
- Author
-
Bahbib, Mohammed, Tamym, Lahcen, Yakhlef, Majid Ben, Benyoucef, Lyes, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Daimi, Kevin, editor, and Al Sadoon, Abeer, editor
- Published
- 2024
- Full Text
- View/download PDF
48. Emotional Analysis Based on Text Using X MBTI Data
- Author
-
Lee, Irene Songyeon, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2024
- Full Text
- View/download PDF
49. Word2Vec-GloVe-BERT Embeddings for Query Expansion
- Author
-
Gabsi, Imen, Kammoun, Hager, Mtar, Rawed, Amous, Ikram, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Abraham, Ajith, editor, Bajaj, Anu, editor, Hanne, Thomas, editor, and Hong, Tzung-Pei, editor
- Published
- 2024
- Full Text
- View/download PDF
50. Twitter Trolling Detection Using Machine Learning
- Author
-
Ghosh, Shubhra Bhunia, Kumar, Horesh, Joshi, Aditya, Kumar, Anshul, Jain, Tarun, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Kumar, Rajesh, editor, Verma, Ajit Kumar, editor, Verma, Om Prakash, editor, and Wadehra, Tanu, editor
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.