70 results on '"Topic model"'
Search Results
2. TG-SMR: AText Summarization Algorithm Based on Topic and Graph Models.
- Author
-
Rakrouki, Mohamed Ali, Alharbe, Nawaf, Khayyat, Mashael, and Aljohani, Abeer
- Subjects
AUTOMATION ,GRAPH theory ,NATURAL language processing ,COMPUTER algorithms ,METHODOLOGY - Abstract
Recently, automation is considered vital in most fields since computing methods have a significant role in facilitating work such as automatic text summarization. However, most of the computing methods that are used in real systems are based on graph models, which are characterized by their simplicity and stability. Thus, this paper proposes an improved extractive text summarization algorithm based on both topic and graph models. The methodology of this work consists of two stages. First, the well-known TextRank algorithm is analyzed and its shortcomings are investigated. Then, an improved method is proposed with a new computational model of sentence weights. The experimental results were carried out on standard DUC2004 and DUC2006 datasets and compared to four text summarization methods. Finally, through experiments on the DUC2004 and DUC2006 datasets, our proposed improved graph model algorithm TG-SMR (Topic Graph-Summarizer) is compared to other text summarization systems. The experimental results prove that the proposed TG-SMR algorithm achieves higher ROUGE scores. It is foreseen that the TG-SMR algorithm will open a new horizon that concerns the performance of ROUGE evaluation indicators. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. T-LBERT with Domain Adaptation for Cross-Domain Sentiment Classification.
- Author
-
Hongye Cao, Qianru Wei, and Jiangbin Zheng
- Published
- 2023
- Full Text
- View/download PDF
4. Can machine understand public administration literature? Applying text mining for systematic review.
- Author
-
Mao, Hanjin and Li, Huafang
- Abstract
Systematic reviews summarize the progress of studies and pave roads for future research in an academic field. However, conducting a systematic literature review can be burdensome and time-consuming. Computer-assisted methods such as text mining techniques have been increasingly applied to improve systematic reviews in public administration. To test the reliability of using text mining for systematic literature reviews, this study uses clustering, topic modeling, automatic multi-term extraction, and text network to systematically review articles published in Chinese Public Administration Review from 2002 to 2019. By comparing machine-produced topics with existing human-coded themes, findings show that applying text mining methods for systematic reviews can be reliable and effective with cautions. The study also offers practical suggestions for researchers to apply text mining methods for systematic literature reviews. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Keywords Extraction and Thesaurus Construction for Domain News.
- Author
-
Meng, Fan, Zhou, Kaile, Bu, Yi, Huang, Win-Bin, Zhang, Pengyi, Long, Fei, and Li, Yan
- Subjects
INFORMATION storage & retrieval systems - Abstract
In modern information retrieval systems, the thesaurus is playing an increasingly important role. In order to better describe and analyze the domain news, this paper proposes a method of domain keyword extraction, and further constructs an effective domain thesaurus. Compared with the previous research, this paper grasps the core information in the field by extracting and combining domain keywords, and improves the domain effectiveness of the thesaurus. In addition, this paper conducts both manual analysis and automated processing to construct high-quality thesaurus, which has practical application value. The final results provide support for the process of indexing, organizing, retrieving and recommending news. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Topic Modeling with Transformers for Sentence-Level Using Coronavirus Corpus.
- Author
-
Mifrah, Sara and Benlahmar, El Habib
- Subjects
TEXT mining ,PROBABILISTIC generative models ,SCIENTIFIC computing ,COVID-19 ,CORONAVIRUSES ,CORPORA - Abstract
A Topic Model is a class of generative probabilistic models which has gained widespread use in computer science in recent years, especially in the field of text mining and information retrieval. Since it was first proposed, it has received a large amount of attention and general interest among scientists in many research areas. It allows us to discover the mix of hidden or "latent" subjects that differs from one document to another in a given corpus. But since topic modeling usually requires the prior definition of some parameters - above all the number of topics k to be discovered -, model evaluation is decisive to identify an "optimal" set of parameters for the specific data. Latent Dirichlet allocation (LDA) and Bidirectional Encoder Representations from Transformers Topic (BerTopic) are the two most popular topic modeling techniques. LDA uses a probabilistic approach whereas BerTopic uses transformers (BERT embeddings) and class-based TF-IDF to create dense clusters. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Empower Keywords Generation for Short Texts with Graph-to-Sequence Learning.
- Author
-
Yang, Ying, Sun, Yaru, Yang, Dawei, Mei, Guang, and Wang, Zhihong
- Subjects
NATURAL languages - Abstract
Keywords are useful in natural language tasks. However, it is a challenge task to extraction keywords from short texts. In which the model may be subject to impaction of topic dependence and poor text organization structure. To resolve this limitation, we propose a keywords generation model ADGCN of short texts based on graph-to-sequence learning. The model to jointly short texts contextual feature and positional feature based adaptation for this task. We learn domain-invariant feature representations by using graph-building feature and node topic feature space, and jointly perform linear generate feature in framework of keywords decoding. Experiment results on real social datasets demonstrate that our proposed model achieves impressive empirical performance on relevance, information and coherence. Besides, the proposed ADGCN also outperforms the state-of-the-arts on public KP20k dataset. The experiments testify that the model can generate the topic keywords of short texts and effectively alleviate the influence of data disturbance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Deconstructing the organizational resilience of construction firms in major emergencies: A text mining analysis of listed construction companies in China.
- Author
-
Zhang, Yuguo, Wang, Wenshun, Mi, Lingyun, Liu, Ying, Qiao, Lijie, Ni, Guodong, and Wang, Xiangyang
- Abstract
The COVID-19 pandemic has resulted in unprecedented huge losses for construction companies in terms of capital, labor, and project construction, highlighting a significant lack of organizational resilience (OR) within the construction industry. How to improve the OR of construction companies has become the key to resolve the crisis. However, there is a lack of systematic insights into the structure and dimensions of OR, as well as a gap in empirical evidence to explain how construction firms systematically construct OR. Therefore, this paper systematically identifies 19 resilience topics and their language descriptions by mining the resilience-related information in 1572 annual reports and expert interview data of listed companies in the Chinese construction industry during the COVID-19 pandemic, using a combination of the topic model and language model. Following the basic concept of OR, a framework of OR dimensions in the construction firms that integrates actions, resources, and capabilities is developed to uncover the complex resilience characteristics of construction firms. The results show that OR sought by listed companies in the construction industry consists of resilient actions, resilience resources, and resilience capabilities. Resilient actions stem from motivating, restraining, protecting, and exploring actions. The resilience resources include the resource reserves of organization, technology, and knowledge, while the resilience capabilities are dynamic capabilities that integrate prevention, response, adaptation, monitoring, perception, and recovery. The findings not only deconstruct the OR framework for construction companies to cope with crises, but also provide new paths for construction managers to cultivate the OR of companies in practice. • The organizational resilience of construction firms is systematically deconstructed. • Text mining was applied to explore resilience information of listed construction firms. • The language model is embedded in the topic model to identify resilient topics. • Primary dimensions are resilient actions, resilient resources, and resilient capabilities. • Organizational resilience of construction firms is composed of 18 sub-dimensions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Mining Syndrome Differentiating Principles from Traditional Chinese Medicine Clinical Data.
- Author
-
Jialin Ma, Zhaojun Wang, Hai Guo, Qian Xie, Tao Wang, and Bolun Chen
- Subjects
MINERAL industries ,CHINESE medicine ,DATA mining ,COMPUTER algorithms ,DATA analysis - Abstract
Syndrome differentiation-based treatment is one of the key characteristics of Traditional Chinese Medicine (TCM). The process of syndrome differentiation is difficult and challenging due to its complexity, diversity and vagueness. Analyzing syndrome principles from historical records of TCM using data mining (DM) technology has been of high interest in recent years. Nevertheless, in most relevant studies, existing DM algorithms have been simply developed for TCM mining, while the combination of TCM theories or its characteristics with DM algorithms has rarely been reported. This paper presents a novel Symptom-Syndrome Topic Model (SSTM), which is a supervised probabilistic topic model with three-tier Bayesian structure. In the SSTM, syndromes are considered as observed topic labels to distinguish certain symptoms from possible symptoms according to their different positions. The generation of our model is in full compliance with the syndrome differentiation theory of TCM. Experimental results show that the SSTM is more effective than other models for syndrome differentiating. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. TRGM: Generating Informative Responses for Open Domain Dialogue Systems.
- Author
-
WANG GAO, HONGTAO DENG, XUN ZHU, and YUWEI WANG
- Subjects
ARTIFICIAL neural networks - Abstract
Sequence-to-sequence (seq2seq) neural network models are able to generate natural sounding conversational responses for open domain dialogue systems. However, these models tend to produce safe, universal responses (e.g., I don't know) regardless of the input, which carry little information and can easily lead to the end of a conversation. In this paper, we propose a new Topic-driven Response Generation Model (TRGM). The proposed model leverages topic information to generate interesting and informative responses. Firstly, we design a topic generation model based on BERT to learn the topic information of the input. Then a response generation model utilizes a gate mechanism and a mixed probability model to integrate topic knowledge into a seq2seq model. We implement the two components using an end-to-end neural network and jointly train each component as a sub-task. Experimental results on a public dataset demonstrate that our method significantly outperforms state-of-the-art baselines on both automatic evaluation metrics and human judgment. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. Hot Topic Discovery across Social Networks Based on Improved LDA Model.
- Author
-
Chang Liu and RuiLin Hu
- Subjects
ONLINE social networks ,SOCIAL networks ,VIRTUAL communities ,WORD frequency ,BIG data - Abstract
With the rapid development of Internet and big data technology, various online social network platforms have been established, producing massive information every day. Hot topic discovery aims to dig out meaningful content that users commonly concern about from the massive information on the Internet. Most of the existing hot topic discovery methods focus on a single network data source, and can hardly grasp hot spots as a whole, nor meet the challenges of text sparsity and topic hotness evaluation in cross-network scenarios. This paper proposes a novel hot topic discovery method across social network based on an im-proved LDA model, which first integrates the text information from multiple social network platforms into a unified data set, then obtains the potential topic distribution in the text through the improved LDA model. Finally, it adopts a heat evaluation method based on the word frequency of topic label words to take the latent topic with the highest heat value as a hot topic. This paper obtains data from the online social networks and constructs a cross-network topic discovery data set. The experimental results demonstrate the superiority of the proposed method compared to baseline methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
12. Dynamic hierarchical Dirichlet processes topic model using the power prior approach.
- Author
-
Jeong, Kuhwan and Kim, Yongdai
- Abstract
The hierarchical Dirichlet processes (HDP) topic model is a Bayesian nonparametric model that provides a flexible mixed-membership to documents through topic allocation to each word. In this paper, we consider dynamic HDP topic models, in which the generative model changes in time, and develop a novel algorithm to update the posterior distribution dynamically by combining the variational inference algorithm and the power prior approach. An important advantage of the proposed algorithm is that it updates the posterior distribution by reusing a given batch algorithm without specifying a complicated dynamic generative model. Thus the proposed algorithm is conceptually and computationally simpler. By analyzing real datasets, we show that the proposed algorithm is a useful alternative approach to dynamic HDP topic identification. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
13. A Latent Topic Analysis Framework for Category-Level Target Promotion in the Supermarket.
- Author
-
Sun, Yi, Hayashi, Teruaki, and Ohsawa, Yukio
- Subjects
SUPERMARKETS ,CONSUMER behavior ,POINT-of-sale systems ,ENTROPY (Information theory) - Abstract
When and which products to recommend to whom has been the essential issue for retailers. In this field, the topic model is attracting researchers' attention for extracting customers' purchase behavior instead of association rules or K-means. However, the optimal number of topics is chosen manually, and there are some limitations to use topic models. In this study, we developed the model by Koltcov et al. for point of sales (POS) data in the supermarket. To grasp the change of topics over time, we divided five-month POS data into ten datasets into two-week intervals and applied the topic model with Renyi entropy separately. The results suggest that splitting data might be a better way to understand customer's behavior. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
14. A thousand words tell more than just numbers: Financial crises and historical headlines.
- Author
-
Ristolainen, Kim, Roukka, Tomi, and Nyberg, Henri
- Abstract
We show that financial crises are preceded by changes in specific types of narrative information contained in newspaper article titles. Our novel international dataset and the resulting empirical evidence are gathered by integrating information from a large panel of economic news articles in global newspapers between the years 1870 and 2016 with conventional macroeconomic and financial indicators. We find that the predictive information of newspaper article titles that signals coming crisis episodes is substantial over and above the macroeconomic and financial indicators. Feature contribution analysis and crisis case studies reveal that the new indicators capture more detailed, but still generalizable information on the buildup of crises. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. A Sparse Topic Model for Bursty Topic Discovery in Social Networks.
- Author
-
Lei Shi, Junping Du, and Feifei Kou
- Published
- 2020
- Full Text
- View/download PDF
16. A Study of a Method to Understand the Intention of Taste Expressions through Text Mining.
- Author
-
Tachibana, Shinichi and Tsuda, Kazuhiko
- Subjects
TASTE ,INTENTION ,WINE tasting ,WEBSITES - Abstract
The purpose of this study is to evidence a method of understanding the intentions of taste expressions from word-of-mouth data of cooking recipe websites using text mining. This study aims to clarify the use of the word "KOKU" as an example to verify the method. KOKU is one of the taste expressions used like richness experienced in various dishes such as in the taste of wine. In order to clarify the relationship between the features of KOKU and cooking categories, they were clustered using the latent Dirichlet allocation. The categories were classified into groups of foods using similar ingredients, sweetness, oils, and seasonings. Through the analysis mentioned above, the features of KOKU were defined. In the past, there has been no attempt to clarify the features of KOKU using word-of-mouth data from cooking recipe websites. The success in defining "KOKU" is evidence that this method has potential to be extended and applied to expressions other than KOKU. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
17. R によるテキス卜分析入門.
- Author
-
三村喬生, 松村杏子, 松村優哉, and 関家友子
- Abstract
Copyright of Journal of Information Science & Technology Association/Joho no Kagaku to Gijutsu is the property of Information Science & Technology Association and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2020
18. A New Vector Representation of Short Texts for Classification.
- Author
-
Yangyang Li and Bo Liu
- Published
- 2020
- Full Text
- View/download PDF
19. ارائة مدل دسته بندی موضوعی تولیدات علمی ح...
- Author
-
محبوبه شکوهیان, عاصفه عاصمی, احمد شعبانی, and مظفر چشم هسهرابی
- Subjects
SUPPORT vector machines ,BLENDED learning ,DATA mining ,TEXT mining ,MACHINE learning ,TEXT processing (Computer science) - Abstract
With the proliferation of the Internet and the rapid growth of electronic articles, text classification has become one of the key and important tools for data organization and management. In text classification a set of basic knowledge is provided to the system by learning. Then, new input documents enter to one of the subject groups. In health literature due to wide variety of topics, preparing such a set of early education is a very time consuming and costly task. The purpose of this article is to present a hybrid model of learning (supervised and unsupervised) for the subject classification of health scientific products that performs the classification operation without the need for an initial labeled set. To extract the thematic model of health science texts from 2009 to 2019 at PubMed database, data mining and text mining were performed using machine learning. Based on Latent Dirichlet Allocation model, the data were analyzed and then the Support Vector Machine was used to classify the texts. In the findings of this study, the model was introduced in three main steps. In data preprocessing, the unnecessary words were eliminated from the data set and the accuracy of the proposed model increased. In the second step, the themes in the texts were extracted using the Latent Dirichlet Allocation method, and as a basic training set in step 3, the data were backed up by the Support Vector Machine algorithm and the classifier learning was performed with the help of these topics. Finally, with the help of the classification, the subject of each document was identified. The results showed that the proposed model can build a better classification by combining unsupervised clustering properties and prior knowledge of the samples. Clustering on labeled samples with a specific similarity criterion merges related texts with prior knowledge, and the learning algorithm teaches classification by supervisory method. Combining classification and clustering can increase the accuracy of classification of health texts. [ABSTRACT FROM AUTHOR]
- Published
- 2020
20. Topic Clustering and Classification on Final Project Reports: a Comparison of Traditional and Modern Approaches.
- Author
-
Bunyamin, Hendra, Heriyanto, Novianti, Stevani, and Sulistiani, Lisan
- Abstract
Text clustering and classification has been studied at large in machine learning literature. For clustering text, topic modeling algorithms are statistical methods to discover unseen structures in archives of documents. Equally important, Convolutional Neural Networks (ConvNets) have been successfully applied for classifying text without knowing information about syntactic and semantic aspects of a language. In this paper, we utilizes both clustering and classification algorithms to organize and classify topics from final project reports. In clustering task, we examine two techniques, that are Latent Dirichlet Allocation (LDA) functioning as a unigram model and LDA supported by a Skip-gram model. Our results show each topical distribution of words found by the techniques are truly representing keywords from every topic; to elaborate, skip-gram model that works hand in hand with LDA are suitable to acquire topical words from the final report topics. For our classification task, we analyze the application of ConvNets, artificial neural nets with ReLU activation functions, and traditional algorithms. Concretely, our findings suggest that selecting parts of a report that contains essential information is very important for ConvNets to learn. Additionally, traditional algorithms is more preferrable than neural nets-based algorithms if the size of dataset is less than 20,000; as a result, our traditional algorithms, specifically Ridge classifier, Passive-Aggressive, and Support Vector Machines outperform neural nets-based algorithms significantly. [ABSTRACT FROM AUTHOR]
- Published
- 2019
21. Topic Clustering and Classification on Final Project Reports: a Comparison of Traditional and Modern Approaches.
- Author
-
Bunyamin, Hendra, Heriyanto, Novianti, Stevani, and Sulistiani, Lisan
- Abstract
Text clustering and classification has been studied at large in machine learning literature. For clustering text, topic modeling algorithms are statistical methods to discover unseen structures in archives of documents. Equally important, Convolutional Neural Networks (ConvNets) have been successfully applied for classifying text without knowing information about syntactic and semantic aspects of a language. In this paper, we utilizes both clustering and classification algorithms to organize and classify topics from final project reports. In clustering task, we examine two techniques, that are Latent Dirichlet Allocation (LDA) functioning as a unigram model and LDA supported by a Skip-gram model. Our results show each topical distribution of words found by the techniques are truly representing keywords from every topic; to elaborate, skip-gram model that works hand in hand with LDA are suitable to acquire topical words from the final report topics. For our classification task, we analyze the application of ConvNets, artificial neural nets with ReLU activation functions, and traditional algorithms. Concretely, our findings suggest that selecting parts of a report that contains essential information is very important for ConvNets to learn. Additionally, traditional algorithms is more preferrable than neural nets-based algorithms if the size of dataset is less than 20,000; as a result, our traditional algorithms, specifically Ridge classifier, Passive-Aggressive, and Support Vector Machines outperform neural nets-based algorithms significantly. [ABSTRACT FROM AUTHOR]
- Published
- 2019
22. SRTM: A Sparse RNN-Topic Model for Discovering Bursty Topics in Big Data of Social Networks.
- Author
-
LEI SHI, JUN-PING DU, MEI-YU LIANG, and FEI-FEI KOU
- Subjects
SOCIAL networks ,BIG data ,PUBLIC opinion ,ONLINE social networks - Abstract
Social networks such as Twitter, Facebook, and Sina microblog have become major sources for generating big data and bursty topics. As bursty topics discovery is helpful to guide public opinion and control network rumors, it is necessary to design an effective method to detect the quickly-updated bursty topics. However, bursty topics discovery is challenging. This main reason is that big data is both high dimensional and sparse in social networks. In this study, we propose a Sparse RNN-Topic Model (SRTM) named SRTM, to deal with the task. First, we leverage RNN to learn the inside relationship between words and IDF to measuring high-frequency words. Second, the model distinguishes modeling between the bursty topic and the common topic to detect the variety of word in time. Third, we introduce "Spike and Slab" prior to decouple the sparsity and smoothness of the topic distribution. The burstiness of word pair is leveraged to achieve automatic bursty topics discovery. Finally, to verify the effectiveness of the proposed SRTM method, we collect Sina microblog dataset to conduct various experiments. Both qualitative and quantitative evaluations demonstrate that our proposed SRTM method outperforms favorably against several state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
23. Eduskunnan täysistunnon puheenaiheet 1999-2014: miten käsitellä LDA-aihemalleja?
- Author
-
LOUKASMÄKI, PETRI and MAKKONEN, KIMMO
- Abstract
Copyright of Politiikka is the property of Finnish Political Science Association and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2019
24. Information Organization Patterns from Online Users in a Social Network.
- Author
-
Chengzhi Zhang, Hua Zhao, Xuehua Chi, and Shuitian Ma
- Subjects
SOCIAL networks ,SOCIAL media ,INFORMATION sharing ,ALGORITHMS ,INTERNET users - Abstract
Recent years have seen the rise of user-generated contents (UGCs) in online social media. Diverse UGC sources and information overload are making it increasingly difficult to satisfy personalized information needs. To organize UGCs in a user-centered way, we should not only map them based on textual topics but also link them with users and even user communities. We propose a multi-dimensional framework to organize information by connecting UGCs, users, and user communities. First, we use a topic model to generate a topic hierarchy from UGCs. Second, an author-topic model is applied to learn user interests. Third, user communities are detected through a label propagation algorithm. Finally, a multi-dimensional information organization pattern is formulated based on similarities among the topic hierarchies of UGCs, user interests, and user communities. The results reveal that: 1) our proposed framework can organize information from multiple sources in a user-centered way; 2) hierarchical topic structures can provide comprehensive and in-depth topics for users; and, 3) user communities are efficient in helping people to connect with others who have similar interests. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
25. A systematic review highlights that there are multiple benefits of urban agriculture besides food.
- Author
-
Pradhan, Prajal, Callaghan, Max, Hu, Yuanchao, Dahal, Kshitij, Hunecke, Claudia, Reusswig, Fritz, Lotze-Campen, Hermann, and Kropp, Jürgen P.
- Abstract
Urban agriculture, including peri-urban farming, can nourish around one billion city dwellers and provide multiple social, economic, and environmental benefits. However, these benefits depend on various factors and are debated. Therefore, we used machine learning to semi-automate a systematic review of the existing literature on urban agriculture. It started with around 76,000 records for initial screening based on a broad keyword search strategy. We applied the topic modeling approach to systematically understand various aspects of urban agriculture based on the full text of around 1,450 relevant publications. Urban agriculture literature covers 14 topics, clustered into 11 themes related to urban agriculture forms, their multi-functionalities, and their underlying challenges. These forms are small-scale ground-based and building-integrated systems. The multi-functionalities include food, livelihoods, health benefits, social space, green infrastructure, biodiversity, and ecosystem services. Therefore, promoting urban agriculture requires accounting for its multi-functionalities, besides food provisioning, and encouraging efficient and sustainable practices. • We identify 14 topics on urban agriculture, which vary spatially and temporally. • Urban agriculture provides socioeconomic and environmental benefits, besides food. • Urban agriculture faces challenges of inefficient practices and health risks. • Sustainable practices can reduce health risks and input needs for urban agriculture. • Promoting urban agriculture requires accounting for its multi-functionalities. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. On predicting elections with hybrid topic based sentiment analysis of tweets.
- Author
-
Bansal, Barkha and Srivastava, Sangeet
- Subjects
SENTIMENT analysis ,LEXICAL access ,MICROBLOGS ,ELECTION forecasting ,PROGRAMMING language semantics - Abstract
Twitter sentiment analysis is quick and inexpensive way for real-time election monitoring and modern day election predictions. Recent research relies on explicit mining of public sentiment using lexical and syntactic features in tweets. However, underlying implicit word relations and co-occurrences are overlooked. This task of capturing semantic relations and word co-occurrences further becomes challenging in case of short length tweets where words are limited. In this paper, we introduce a novel method: Hybrid Topic Based Sentiment Analysis (HTBSA) with the aim of capturing word relations and co-occurrences in short length tweets for election prediction using tweets. First, we extract latent topics from rich corpus of short texts using Biterm Topic model (BTM), then sentiments for each topic are learnt from pre-existing lexical resources. Finally, sentiment score of each tweet is calculated using sentiment orientation and weight of each topic contained in it. We use more than 300,000 tweets, collected from 1st-20th February, 2017, to predict Uttar Pradesh (U.P) legislative elections. Geo tagging is employed for key words which are not exclusive to the elections. Results show that HTBSA has out performed existing Twitter based election prediction techniques with a decrease of 3.5% in MAE. Our study can be easily and efficiently extended for real time election monitoring and future election predictions. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
27. Fuzzy Approach Topic Discovery in Health and Medical Corpora.
- Author
-
Karami, Amir, Gangopadhyay, Aryya, Zhou, Bin, and Kharrazi, Hadi
- Subjects
TEXT mining ,MEDICAL databases ,FUZZY logic ,AUTOMATION ,CORPORA ,ELECTRONIC health records - Abstract
The majority of medical documents and electronic health records are in text format that poses a challenge for data processing and finding relevant documents. Looking for ways to automatically retrieve the enormous amount of health and medical knowledge has always been an intriguing topic. Powerful methods have been developed in recent years to make the text processing automatic. One of the popular approaches to retrieve information based on discovering the themes in health and medical corpora is topic modeling; however, this approach still needs new perspectives. In this research, we describe
fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can handle health and medical corpora redundancy issue and provides a new method to estimate the number of topics. The quantitative evaluations show that FLSA produces superior performance and features tolatent Dirichlet allocation , the most popular topic model. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
28. AR•知育分野における新規事業立案に関する研究.
- Author
-
西田彩子, 大久保三四朗, 大森照夫, 木下光博, 酒本裕明, 杉山典正, 都築涼香, and 法宗布美子
- Abstract
Copyright of Journal of Information Science & Technology Association/Joho no Kagaku to Gijutsu is the property of Information Science & Technology Association and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2018
29. A New Fine-grain SMS Corpus and Its Corresponding Classifier Using Probabilistic Topic Model.
- Author
-
Jialin Ma, Yongjun Zhang, Zhijian Wang, and Bolun Chen
- Subjects
TEXT messages ,SPAM filtering (Email) ,NETWORK performance ,SUPPORT vector machines ,K-nearest neighbor classification - Abstract
Nowadays, SMS spam has been overflowing in many countries. In fact, the standards of filtering SMS spam are different from country to country. However, the current technologies and researches about SMS spam filtering all focus on dividing SMS message into two classes: legitimate and illegitimate. It does not conform to the actual situation and need. Furthermore, they are facing several difficulties, such as: (1) High quality and large-scale SMS spam corpus is very scarce, fine categorized SMS spam corpus is even none at all. This seriously handicaps the researchers' studies. (2) The limited length of SMS messages lead to lack of enough features. These factors seriously degrade the performance of the traditional classifiers (such as SVM, K-NN, and Bayes). In this paper, we present a new fine categorized SMS spam corpus which is unique and the largest one as far as we know. In addition, we propose a classifier, which is based on the probability topic model. The classifier can alleviate feature sparse problem in the task of SMS spam filtering. Moreover, we compare the approach with three typical classifiers on the new SMS spam corpus. The experimental results show that the proposed approach is more effective for the task of SMS spam filtering. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
30. Green housing on social media in China: A text mining analysis.
- Author
-
Shen, Chen and Li, Ping
- Subjects
TEXT mining ,SOCIAL media ,SUSTAINABLE buildings ,CITIZENS ,PUBLIC opinion ,COEXISTENCE of species ,VIRTUAL communities - Abstract
Reducing carbon emissions and promoting energy efficiency are imperative to the harmonious coexistence between humans and nature. Green buildings can help minimize energy drain and become an effective way to realize Sustainable Development Goals. However, promoting green housing (GH) faces greater challenges than implementing other green buildings. Although governments vigorously promote GH, the development of GH in China is still stuck in the rut of excessive authority intervention, and public response remains limited. Therefore, this research crawled massive online textual data and applied text mining to explore dynamic public opinions, temporal patterns of GH-related public concerns with different sentiment tendencies and driving factors of different sentiments. The results reveal that positive public sentiments mainly focus on ecological, environmental, social, and individual benefits, but the topic of individual-level benefits gradually decreased. As for negative sentiment on GH, price-related issues are the prominent reason, and quality-related issues have been discussed extensively in recent years and have become the most concerning issues. Moreover, regulations, technical level, education level, and incentives have significant positive impacts on citizens' positive sentiments toward GH, while these factors have significant negative impacts on citizens' negative sentiments toward GH. The findings can help planners, engineers, and governmental officers develop a systematic understanding of micro-level opinions and offer new insights for GH policies and governance. • Diverse green housing-related topics are identified by the LDA topic model. • Price-related issues are the prominent reason for negative attitudes toward green housing. • Positive attitudes mainly focus on ecological and environmental benefits of green housing. • Regulations, technical level, and incentives have significant negative impacts on negative sentiments. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
31. Exploring topics and trends in Chinese ATC incident reports using a domain-knowledge driven topic model.
- Author
-
Bao, Jie, Chen, Yixuan, Yin, Jianan, Chen, Xinyuan, and Zhu, Dan
- Subjects
RADAR equipment ,AIR traffic ,AERONAUTICAL safety measures ,TRAFFIC safety ,AIRCRAFT accidents ,PROCESS optimization - Abstract
The primary objective of this study is to discover hidden topics and trends from historical ATC incident reports. A novel domain-knowledge driven topic (DDT) model is proposed to explore the embedded patterns and hidden connections in ATC incident reports. Seventeen-year ATC incident records are collected from Local Air Traffic Management Branch of Civil Aviation Administration of China to illustrate the procedure. First, a total of twenty topics are identified from the collected reports, including aircraft-flight-operation related topics, crew-activities related topics, airspace-control-transfer related topics and others. Then, the topic evolution trend over years is explored, which identifies four hot topics and four cold topics over the study period. The results reveal that in general the contributing factors of ATC incidents are gradually shifting from external factors (e.g., radar equipment or aircraft components) to human related factors (e.g., instruction communication or handover of airspace control) due to the improved quality of communication equipment and some adjustments of ATC rules over the past two decades. Finally, the topic evolution analyses across different ATC areas and flight phases are further conducted. The findings indicate that the potential causes of ATC incidents are different across various ATC areas and flight phases due to the variation in geographical environment and local policies. The results of this research can help local ATC authorities conduct efficient safety performance assessment, implement proactive countermeasures for specific areas to enhance air traffic safety, and provide aviation authorities with insightful suggestions for ATC process optimization and design. • A domain-knowledge driven topic model is developed to discover topics in ATC incident reports. • The identified topics reveal that factors are shifting from external factors to human related factors in China. • The dynamic topic trends of ATC incidents across different control areas and flight phases are explored. • The results could benefit local ATC authorities for safety performance assessment and ATC process optimization. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
32. TSAKE: A topical and structural automatic keyphrase extractor.
- Author
-
Rafiei-Asl, Javad and Nickabadi, Ahmad
- Subjects
TERMS & phrases ,DATA extraction ,SET theory ,TASK performance ,INFORMATION retrieval ,NATURAL language processing - Abstract
The keyphrases of a text entity are a set of words or phrases that concisely describe the main content of that text. Automatic keyphrase extraction plays an important role in natural language processing and information retrieval tasks such as text summarization, text categorization, full-text indexing, and cross-lingual text reuse. However, automatic keyphrase extraction is still a complicated task and the performance of the current keyphrase extraction methods is low. Automatic discovery of high-quality and meaningful keyphrases requires the application of useful information and suitable mining techniques. This paper proposes Topical and Structural Keyphrase Extractor (TSAKE) for the task of automatic keyphrase extraction. TSAKE combines the prior knowledge about the input langue learned by an N-gram topical model (TNG) with the co-occurrence graph of the input text to form some topical graphs. Different from most of the recent keyphrase extraction models, TSAKE uses the topic model to weight the edges instead of the nodes of the co-occurrence graph. Moreover, while TNG represents the general topics of the language, TSAKE applies network analysis techniques to each topical graph to detect finer grained sub-topics and extract more important words of each sub-topic. The use of these informative words in the ranking process of the candidate keyphrases improves the quality of the final keyphrases proposed by TSAKE. The results of our experiment studies conducted on three manually annotated datasets show the superiority of the proposed model over three baseline techniques and six state-of-the-art models. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
33. Diverse reports recommendation system based on latent Dirichlet allocation.
- Author
-
Uto, Masaki, Louvigné, Sébastien, Kato, Yoshihiro, Ishii, Takatoshi, and Miyazawa, Yoshimitsu
- Abstract
This paper presents a proposal for system supporting learners in improving their report-writing skills by recommending reports from previous learners. The proposed system recommends reports that share similar subjects but which have different structures, expressions, and originality based on the distributions of words and subjects within the reports, as estimated using latent Dirichlet allocation (LDA). An important assumption made for this study is that reports with different word distributions tend to include different structures, expressions, and originality when they share similar subjects. Based on that assumption, the system selects and recommends reports that have dissimilar word distributions but which share similar subject distributions with a learner's report. The proposed system is expected to enhance learning of various writing skills from other learners. Finally, this paper demonstrates the effectiveness of the proposed system through actual data experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
34. Recognizing Actions from Different Views by Topic Transfer.
- Author
-
Jia Liu
- Subjects
PATTERN recognition systems ,MACHINE learning ,COMPUTER vision ,DATA transmission systems ,FEATURE extraction - Abstract
In this paper, we describe a novel method for recognizing human actions from different views via view knowledge transfer. Our approach is characterized by two aspects: 1) We propose a unsupervised topic transfer model (TTM) to model two view-dependent vocabularies, where the original bag of visual words (BoVW) representation can be transferred into a bag of topics (BoT) representation. The higher-level BoT features, which can be shared across views, can connect action models for different views. 2) Our features make it possible to obtain a discriminative model of action under one view and categorize actions in another view. We tested our approach on the IXMAS data set, and the results are promising, given such a simple approach. In addition, we also demonstrate a supervised topic transfer model (STTM), which can combine transfer feature learning and discriminative classifier learning into one framework. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
35. A semantic modeling method for social network short text based on spatial and temporal characteristics.
- Author
-
Kou, Feifei, Du, Junping, Lin, Zijian, Liang, Meiyu, Shi, Lei, Yang, Congxian, and Li, Haisheng
- Subjects
SPATIOTEMPORAL processes ,SEMANTICS ,SOCIAL networks ,TEXT messages - Abstract
Highlights • Proposing a semantic modeling method for social network short text. • Proposing the concept of spatiotemporal region to generate reasonable topic proportions. • Overcoming the semantic sparsity of social network short text. • Experimental results of similar document search task over four real social media datasets demonstrate the effectiveness of the proposed STTM. Abstract Given the social network short text native sparsity, semantic inference becomes an infeasible task for conventional topic models. By exploiting the spatial and temporal characteristics of social network data, we propose a social network short text semantic modeling method, named by Spatial and Temporal Topic Model (STTM). To further overcome short text sparsity, STTM leverages co-occurrence word–word pair to reduce the sparsity problem, and moreover, it incorporates time information into the process of topics modeling in order to generate topics with higher quality. Experimental results over four real social media datasets verify the effectiveness of STTM. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
36. Identifying potential breakthrough research: A machine learning method using scientific papers and Twitter data.
- Author
-
Li, Xin, Wen, Yang, Jiang, Jiaojiao, Daim, Tugrul, and Huang, Lucheng
- Subjects
MACHINE learning ,RESOURCE allocation ,DATA mining ,DISRUPTIVE innovations ,GREEN technology - Abstract
Breakthrough research may signal shifts in science, technology, and innovation systems. Early identification of breakthrough research is important not only for scientists, but also for policy makers and R&D experts in developing R&D strategies and allocating R&D resources. Researchers mostly use scientific papers data to identify potential breakthrough research, but they rarely make use of Twitter data related to scientific research and machine learning methods. Analysis of Twitter data is of great significance for us to understand the public's perception of potential breakthrough research and to identify potential breakthrough research. Machine learning methods can assist us in predicting the trend of events by utilizing prior knowledge and experience. Therefore, this paper proposes a framework for identifying potential breakthrough research using machine learning methods with scientific papers and Twitter data. We select solar cells as a case study to verify the valid and flexible of this framework. In this case, we use machine learning method to discover potential breakthrough research from scientific papers, and we use Twitter data mining to analyze Twitter users' sense of and response to the discovered potential breakthrough research, which aims to achieve a more extensive and diverse assessment of the discovered potential breakthrough research. This paper contributes to identifying potential breakthrough research, as well as understanding the emergence and development of breakthrough research. It will be of interest to R&D experts in the field of solar cell technology. • We proposed a framework for identifying potential breakthrough research using machine learning method. • We found 8 potential breakthrough researches in the field of solar cell technology in 2015. • Twitter data mining could be used to assist in identifying potential breakthrough research. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
37. Tracking the impact of COVID-19 and lockdown policy on public mental health using social media: an infoveillance study.
- Author
-
Li, Minghui, Hua, Yining, Liao, Yanhui, Zhou, Li, Li, Xue, Wang, Ling, and Yang, Jie
- Abstract
Background: The COVID-19 pandemic and its corresponding preventive and control measures have increased the mental burden on the public. Understanding and tracking changes in public mental status can facilitate optimizing public mental health intervention and control strategies.Objective: To build a social media-based pipeline that tracks public mental changes and use it to understand public mental health status regarding the pandemic.Methods: This study used COVID-19-related tweets posted between February 2020 to April 2022. The tweets were downloaded using unique identifiers through the Twitter application programming interface. We created a lexicon of four mental health problems (depression, anxiety, insomnia, and addiction) to identify mental health-related tweets and developed a dictionary for identifying healthcare workers. We analyzed temporal and geographical distributions of public mental health status during the pandemic and further compared distributions among healthcare workers versus the general public, supplemented by topic modeling on their underlying foci. Finally, we used interrupted time series to examine the state-wide impact of lockdown policy on public mental health in 12 states.Results: We extracted 4,213,005 tweets related to mental health and COVID-19 from 2,316,817 users. 2,161,357 (51.30%) of the tweets were related to "depression", while 1,923,635 (45.66%), 225,205 (5.35%) and 150,006 (3.56%) were related to "anxiety", "insomnia", and "addiction", respectively. Compared to the general public, healthcare workers had higher risks of all four types of problems (all P<.001), and they concerned more about clinical topics than everyday issues (e.g., "students' pressure", "panic buying" and "fuel problems") than the general public. Finally, the lockdown policy had significant associations with public mental health in 4 out of the 12 states we studied, among which Pennsylvania showed a positive association, while Michigan, North Carolina, and Ohio showed the opposite (all P<.05).Conclusions: The impact of COVID-19 and corresponding control measures on the public's mental status is dynamic and shows variability among different cohorts regarding disease types, occupations, and regional groups. Health agencies and policymakers should primarily focus on depression (reported by 51.30% of the tweets) and insomnia (had an ever-increasing trend since the beginning of the pandemic), especially among healthcare workers. Our pipeline timely tracks and analyzes public mental health changes, especially when primary studies and large-scale surveys are hard to conduct.Clinicaltrial: [ABSTRACT FROM AUTHOR]- Published
- 2022
- Full Text
- View/download PDF
38. Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation.
- Author
-
Hyung-Bae Jeon and Soo-Young Lee
- Subjects
DOMAIN-specific programming languages ,TRANSCRIPTION (Linguistics) ,DIRICHLET principle ,PROBABILITY theory ,CLUSTER analysis (Statistics) - Abstract
Two new methods are proposed for an unsupervised adaptation of a language model (LM) with a single sentence for automatic transcription tasks. At the training phase, training documents are clustered by a method known as Latent Dirichlet allocation (LDA), and then a domain-specific LM is trained for each cluster. At the test phase, an adapted LM is presented as a linear mixture of the now trained domain-specific LMs. Unlike previous adaptation methods, the proposed methods fully utilize a trained LDA model for the estimation of weight values, which are then to be assigned to the now trained domainspecific LMs; therefore, the clustering and weightestimation algorithms of the trained LDA model are reliable. For the continuous speech recognition benchmark tests, the proposed methods outperform other unsupervised LM adaptation methods based on latent semantic analysis, non-negative matrix factorization, and LDA with n-gram counting. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
39. 『理論と方法』におけるテーマの30年,方法の30年.
- Author
-
大林真也 and 瀧川裕貴
- Abstract
Copyright of Sociological Theory & Methods / Riron to Hoho is the property of Japanese Association for Mathematical Sociology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2016
40. Discovering treatment pattern in Traditional Chinese Medicine clinical cases by exploiting supervised topic model and domain knowledge.
- Author
-
Yao, Liang, Zhang, Yin, Wei, Baogang, Wang, Wei, Zhang, Yuejiao, Ren, Xiaolin, and Bian, Yali
- Abstract
In Traditional Chinese Medicine (TCM), the prescription is the crystallization of clinical experience of doctors, which is the main way to cure diseases in China for thousands of years. Clinical cases, on the other hand, describe how doctors diagnose and prescribe. In this paper, we propose a framework which mines treatment patterns in TCM clinical cases by exploiting supervised topic model and TCM domain knowledge. The framework can reflect principle rules in TCM and improve function prediction of a new prescription. We evaluate our method on 3090 real world TCM clinical cases. The experiment validates the effectiveness of our method. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
41. User Intent Estimation from Access logs with Topic Model.
- Author
-
Uetsuji, Keisuke, Yanagimoto, Hidekazu, and Yoshioka, Michifumi
- Subjects
ONLINE shopping ,DECISION making ,ESTIMATION theory ,ACCESS to information ,RECOMMENDER systems - Abstract
As the Internet is widespread and there are many online shops in the Internet, many persons buy products in the online shops. Customer's behavior in the online shops is a sequence of customer driven activities intrinsically because his/her movement in an online shop occurs according to only his/her decision. Hence, to achieve satisfactory purchase experiments it is important how the shop supports them. Online shops have to predict visitors’ intents correctly to support them effectively. One of information resources the shops can use is an access log including information on customer's movement in the online shop. If they are histories of customer's behaviors in online shops and the behaviors depend on customer's intents, we can extract new knowledge on them from the access logs. Speaking concretely, we can predict customers’ intents from the access logs since their internal intents affect their activities. We can realized more appropriate recommendation service by changing recommendation strategy depending on customer's intents. In this paper, we propose a method to predict customer's intents from access logs in a real online shop. We adopt a Topic Tracking Model (TTM) to analyze the access logs. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
42. Predicting the pattern of technology convergence using big-data technology on large-scale triadic patents.
- Author
-
Lee, Won Sang, Han, Eun Jin, and Sohn, So Young
- Subjects
TECHNOLOGY convergence ,BIG data ,TECHNOLOGICAL innovations ,ECONOMIC development ,ASSOCIATION rule mining ,GENETIC engineering ,PATTERN recognition systems - Abstract
Understanding technology convergence became crucial for pursuing innovation and economic growth. This paper attempts to predict the pattern of technology convergence by jointly applying the Association Rule and Link Prediction to entire IPCs related to triadic patents filed during the period from 1955 to 2011. We further use a topic model to discover emerging areas of the predicted technology convergence. The results show that the medical area is in the center of convergence, and we predict that technologies for treating respiratory system/blood/sense disorders are associated with the technologies of genetic engineering/peptide/heterocyclic compounds. After eliminating the majority of convergence, we found the convergence pattern among activating catalysts, printing, advanced networking, controlling devices, secured communication with in-memory system, television system with pattern recognition, and image processing and analyzing technologies. The results of our study are expected to contribute to firms that seek new innovative technological domain. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
43. Extracting Relevant Terms from Mashup Descriptions for Service Recommendation.
- Author
-
Yang Zhong and Yushun Fan
- Subjects
WEB services ,MASHUPS (Internet) ,RECOMMENDER systems ,WEB-based user interfaces ,DISCRIMINANT analysis - Abstract
Due to the exploding growth in the number of web services, mashup has emerged as a service composition technique to reuse existing services and create new applications with the least amount of effort. Service recommendation is essential to facilitate mashup developers locating desired component services among a large collection of candidates. However, the majority of existing methods utilize service profiles for content matching, not mashup descriptions. This makes them suffer from vocabulary gap and cold-start problem when recommending components for new mashups. In this paper, we propose a two-step approach to generate high-quality service representation from mashup descriptions. The first step employs a linear discriminant function to assign each term with a component service such that a coarse-grained service representation can be derived. In the second step, a novel probabilistic topic model is proposed to extract relevant terms from coarse-grained service representation. Finally, a score function is designed based on the final high-quality representation to determine recommendations. Experiments on a data set from ProgrammableWeb.com show that the proposed model significantly outperforms state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
44. Intrusion detection technique based on flow aggregation and latent semantic analysis.
- Author
-
Wu, Junrui, Wang, Wenyong, Huang, Lisheng, and Zhang, Fengjun
- Subjects
LATENT semantic analysis ,INTRUSION detection systems (Computer security) ,RECEIVER operating characteristic curves ,JUDGMENT (Psychology) - Abstract
Traditional network intrusion detection systems cannot identify new burgeoning invasive activities due to the inconspicuous features of malicious behaviors and the enormous increase of data transmitted via different devices. For the inconspicuous features, a novel aggregated flow-based inspection is suggested to amplify features of malicious behaviors. With regards to the enormous amount of data, a new data analysis method is introduced for efficiently classifying network traffic in this paper, which utilized the topic model to construct a doc-word matrix from statistical features and then analyzes latent semantic information to determine whether an aggregated flow is malicious. The performance of the proposed technique is evaluated using CIC-IDS2017, UNSW-NB15, and NSL-KDD datasets, with the results indicating that our technique achieves higher performance than other competing methods. Additionally, the ROC curves demonstrate that the proposed technique is capable of accurate classification even at a low sample rate. • An aggregated flow-based inspection to re-organize network data into aggregated flows. • A K-means based method is proposed to implement feature-word mapping and generate doc-word matrix. • PLSA is used for constructing the topic model. • A judgment metric is defined to elaborate the relationship between selected words-corresponded feature volumes and flow behavior. • This technique results in high classification ability. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. Research on product recommendation based on matrix factorization models fusing user reviews.
- Author
-
Wang, Heyong, Hong, Zhenqin, and Hong, Ming
- Subjects
MATRIX decomposition ,USER-generated content ,RECOMMENDER systems - Abstract
Nowadays, recommendation models based on matrix factorization (MF) suffer from the problem of rating sparsity because user-product rating matrix is usually sparse. To address the problem, it is significant to fuse some contextual data or side information on basic MF models. According to this core idea, this paper proposes a modified recommendation model, MFFR (matrix factorization fusing reviews) which recommend products by considering the fusing information on user reviews and user ratings. First, MFFR constructs user-product preference matrix from user reviews by using Latent Dirichlet Allocation (LDA) topic model. Then MFFR predicts ratings and generates personalized top-n recommendation products by using MF model to learn comprehensive latent factors of user-product rating matrix and user-product preference matrix simultaneously. The experimental results of three published datasets demonstrate that our model MFFR can achieve more accurate predicted ratings and hits more correct products of top-n recommendation than the comparative traditional models. MFFR can effectively raise the quality of recommendation, especially in the high level of rating sparsity. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
46. Visual Topic Network: Building better image representations for images in social media.
- Author
-
Niu, Zhenxing, Hua, Gang, Tian, Qi, and Gao, Xinbo
- Subjects
IMAGE representation ,IMAGE processing ,ONLINE social networks ,VIRTUAL reality ,NUMERICAL analysis - Abstract
Topic models have demonstrated to be effective on building image representations for general images. Recently, how to build better image representations for images in social media becomes an interesting problem, where one key issue is how to leverage images’ social contextual cues, e.g., user tags associated with images. Nevertheless, most previous methods either just exploited image content and neglect user tags, or assumed there are exact correspondences between image content and tags, i.e., tags are closely related to image content. Thus, they cannot be applied to the realistic scenarios where the images are only weakly annotated with tags, i.e., tags are only loosely related to image content as already manifested in real-world social media data. In this paper, we address the problem of building better image representations in social media, where the images are weakly annotated with user tags. In particular, we organize a collection of images as an image network where the relations between images are modeled by user tags. To model such image network and build image representations, we further propose a network structured topic model, namely Visual Topic Network (VTN), where the image content and their relations are simultaneously modeled. In this way, the weakly annotated tags can be effectively leveraged as building image representations. The proposed VTN model is inspired by the Relational Topic Model (RTM) recently introduced in the document analysis literature. Different from the binary article relations in RTM, the proposed VTN can model the multiple-level image relations. Our extensive experiments on two social media datasets demonstrated the advantage of the proposed VTN model. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
47. A Comprehensive Clustering Method of Social Tags Based on LDA.
- Author
-
Li Huizong, Hu Xuegang, Yang Hengyu, Lin Yaojin, and He Wei
- Published
- 2015
- Full Text
- View/download PDF
48. Link-topic model for biomedical abbreviation disambiguation.
- Author
-
Kim, Seonho and Yoon, Juntae
- Abstract
Introduction The ambiguity of biomedical abbreviations is one of the challenges in biomedical text mining systems. In particular, the handling of term variants and abbreviations without nearby definitions is a critical issue. In this study, we adopt the concepts of topic of document and word link to disambiguate biomedical abbreviations. Methods We newly suggest the link topic model inspired by the latent Dirichlet allocation model, in which each document is perceived as a random mixture of topics, where each topic is characterized by a distribution over words. Thus, the most probable expansions with respect to abbreviations of a given abstract are determined by word-topic, document-topic, and word-link distributions estimated from a document collection through the link topic model. The model allows two distinct modes of word generation to incorporate semantic dependencies among words, particularly long form words of abbreviations and their sentential co-occurring words; a word can be generated either dependently on the long form of the abbreviation or independently. The semantic dependency between two words is defined as a link and a new random parameter for the link is assigned to each word as well as a topic parameter. Because the link status indicates whether the word constitutes a link with a given specific long form, it has the effect of determining whether a word forms a unigram or a skipping/consecutive bigram with respect to the long form. Furthermore, we place a constraint on the model so that a word has the same topic as a specific long form if it is generated in reference to the long form. Consequently, documents are generated from the two hidden parameters, i.e. topic and link, and the most probable expansion of a specific abbreviation is estimated from the parameters. Results Our model relaxes the bag-of-words assumption of the standard topic model in which the word order is neglected, and it captures a richer structure of text than does the standard topic model by considering unigrams and semantically associated bigrams simultaneously. The addition of semantic links improves the disambiguation accuracy without removing irrelevant contextual words and reduces the parameter space of massive skipping or consecutive bigrams. The link topic model achieves 98.42% disambiguation accuracy on 73,505 MEDLINE abstracts with respect to 21 three letter abbreviations and their 139 distinct long forms. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
49. A Knowledge Service-oriented Domain Knowledge Discovery Process.
- Author
-
Wang Liwei, Li Mei, Mu Dongmei, and Bi Qiang
- Published
- 2015
- Full Text
- View/download PDF
50. Employers’ Expectations: A Probabilistic Text Mining Model.
- Author
-
Gao, Lu and Eldin, Neil
- Subjects
EMPLOYER attitudes ,TEXT mining ,MATHEMATICAL models ,SEARCH engines ,INTERNET - Abstract
This study uses text mining techniques to analyze employment data posted over the internet. The objective is to identify knowledge areas, skills and expertise relevant to jobs in the construction industry. We utilized the fast growing online job search engines to understand the construction job market and employer expectations. Over 20,000 job advertisements were downloaded from various websites between Oct 14th 2012 and March 15th 2013. We developed a text mining method to identify derived job qualification information from the downloaded pages. The developed algorithm is capable to derive rules by automatically extracting statistically significant patterns present inside preselected qualifications. The selection rules can then be used to detect the presence of these qualifications in new pages. Once the qualifications are identified, we used the Latent Dirichlet Allocation (LDA) model to identify groups of skills that are required by employers. One of the major advantages of implementing LDA model is that it is an unsupervised approach and no training is needed. The algorithm was applied to a case study as an illustrative example. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.