140 results on '"Topic model"'
Search Results
2. Keywords Extraction and Thesaurus Construction for Domain News.
- Author
-
Meng, Fan, Zhou, Kaile, Bu, Yi, Huang, Win-Bin, Zhang, Pengyi, Long, Fei, and Li, Yan
- Subjects
INFORMATION storage & retrieval systems - Abstract
In modern information retrieval systems, the thesaurus is playing an increasingly important role. In order to better describe and analyze the domain news, this paper proposes a method of domain keyword extraction, and further constructs an effective domain thesaurus. Compared with the previous research, this paper grasps the core information in the field by extracting and combining domain keywords, and improves the domain effectiveness of the thesaurus. In addition, this paper conducts both manual analysis and automated processing to construct high-quality thesaurus, which has practical application value. The final results provide support for the process of indexing, organizing, retrieving and recommending news. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Using topic modeling to unravel the nuanced effects of built environment on bicycle-metro integrated usage.
- Author
-
Bi, Hui, Gao, Hui, Li, Aoyong, and Ye, Zhirui
- Abstract
Bicycle-metro integration, in which bicycling is used as a flexible feeder mode to connect with public transport nodes presents new opportunities for sustainable transportation. It is known that the built environment can influence travel attitudes and choice, yet the empirical evidence for the role of built environment features in shaping the bicycle-metro integration remains rare. Inspired by the idea of text mining, this article is an attempt to demonstrate a data-driven semantic framework to capture key topic-based features of land use and bicycle-metro integrated usage in the vicinity of metro stations as well as their interactions. Latent Dirichlet Allocation topic modeling is analogously implemented here to generate a range of probability-based land use patterns and mobility patterns, and the associations between them are investigated by multivariate linear regression. A case study from Shanghai shows that the mixed land use and diversification of urban functions in the catchment areas of the metro stations can be detected effectively by 11 identified land use patterns. Based on 7 derived mobility patterns, this paper gives a probabilistic explanation to the time-varying properties of the bicycle-metro usage. All of the above thematic topics exhibit notably heterogeneous patterns in spatial distribution. The topic compositions in terms of land use pattern and mobility pattern at the station level reveal the current performance of station areas. Plus, results from the regression analysis confirm that most of the land use patterns that are related to various mixed use have close relationships with mobility patterns of bicycle-metro integration. Yet it is noteworthy that the effects of land use patterns often differ and change over time, namely affecting different mobility patterns. This study gives rise to alternative insights into the synergy between bike sharing and metro, which may help policymakers to develop more targeted TOD strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. An unsupervised semantic text similarity measurement model in resource-limited scenes.
- Author
-
Xiao, Qi, Qin, Yunchuan, Li, Kenli, Tang, Zhuo, Wu, Fan, and Liu, Zhizhong
- Subjects
- *
ARTIFICIAL intelligence , *INTERNET of things , *MEASUREMENT - Abstract
As the basis of many artificial intelligence tasks, text similarity measurement has received extensive attention in current studies. However, few of them focus on the resource-limited scenes (i.e., limited computational resources and few training datasets), which are becoming increasingly popular and challenging with the development of the Internet of Things. Worse still, popular methods such as the deep-neural-network-based methods may lose their power in such scenes, since they typically require considerable computational resources. As for most current traditional methods, they also have issues of not effectively exploiting the semantic information in the sentences. As an alternative, this paper proposes a lightweight and semantically rich text similarity measurement model named the TES-TK model. In this model, a sentence is first transformed into a tree structure called TES-Tree with the integration of syntactic information, semantic knowledge, and topic distribution, aiming to comprehensively represent the multidimensional semantics of sentences. Afterward, a modified tree kernel model is designed to calculate the similarity between each pair of TES-Trees. In this way, the similarity score between the two related sentences can be retrieved. Experiments on 19 public benchmark datasets (STS2012–2015) demonstrate that the proposed approach exhibits significantly better performance than the compared eight peer methods on most datasets. Especially in resource-limited scenes, our approach achieved highly competitive results compared with the latest methods, such as BERT. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Empower Keywords Generation for Short Texts with Graph-to-Sequence Learning.
- Author
-
Yang, Ying, Sun, Yaru, Yang, Dawei, Mei, Guang, and Wang, Zhihong
- Subjects
NATURAL languages - Abstract
Keywords are useful in natural language tasks. However, it is a challenge task to extraction keywords from short texts. In which the model may be subject to impaction of topic dependence and poor text organization structure. To resolve this limitation, we propose a keywords generation model ADGCN of short texts based on graph-to-sequence learning. The model to jointly short texts contextual feature and positional feature based adaptation for this task. We learn domain-invariant feature representations by using graph-building feature and node topic feature space, and jointly perform linear generate feature in framework of keywords decoding. Experiment results on real social datasets demonstrate that our proposed model achieves impressive empirical performance on relevance, information and coherence. Besides, the proposed ADGCN also outperforms the state-of-the-arts on public KP20k dataset. The experiments testify that the model can generate the topic keywords of short texts and effectively alleviate the influence of data disturbance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Revealing the supply-demand relationship of urban cultural ecosystem services: The combination of open-source spatial model and topic model.
- Author
-
Dang, Hui, Li, Jing, and Zhou, Zixiang
- Subjects
- *
URBAN ecology , *ECOSYSTEM services , *NATURAL language processing , *SUSTAINABLE urban development , *SUPPLY & demand , *CITIES & towns - Abstract
Cultural ecosystem services (CES) are of significant interest due to their effectiveness in assessing the impact of ecosystem degradation on human well-being. The evaluation of spatial coordination between the supply and demand of urban CES is crucial for guiding the sustainable development of urban areas. However, current research and survey methods are time-consuming and laborious, lacking in full integration of beneficiary opinions, and insufficient in examining multiple CES in an integrated manner. In this study, we have developed a novel framework that integrates an investigation-driven approach with a natural language processing topic model to assess the supply and demand of CES and their spatial correspondence. Using the main urban area of Xi'an as a case study, we analyzed the spatial coherence of supply and demand including aesthetic, cultural, educational, recreational, and spiritual services. Our findings demonstrate that the fusion of multiple models can effectively evaluate the CES supply from a natural perspective of nature and the demand for CES from a beneficiary perspective. High-supply areas for CES are concentrated in the center region of the study area and in proximity to roads and rivers. Aesthetic service supply was higher than that of other service types, followed by recreational and spiritual services supply. Demand is notably high in the city's hinterland, with substantial interest in recreational and aesthetic services. We observed mismatches between supply and demand across service categories ranging from 33.58% to 67.64%, with supply exceeding demand in most regions. Therefore, tailored measures should be implemented to address the supply and demand mismatch of CES based on local conditions. Our research introduces a new technical tool for analyzing the supply-demand matching relationship of CES and establishes a theoretical foundation for sustainable urban management. • Topic model can effectively study cultural ecosystem services (CES) demand. • The distribution of supply and demand for CES are highly localized. • A framework for supply and demand assessment by urban CES. • The framework addresses the difficulty of measuring services at the city scale. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. A sustainable development benchmarking framework for energy companies based on topic mining and knowledge graph: The case of oil and gas industry.
- Author
-
Xu, Xiaofeng, Liu, Zhiting, Liu, Wenzhi, Pei, Chuantao, Wu, Xiangfan, and Nie, Zhengya
- Subjects
- *
KNOWLEDGE graphs , *GAS industry , *ENERGY industries , *PETROLEUM industry , *BENCHMARKING (Management) , *SUSTAINABLE development - Abstract
Benchmarking is one of the key management approaches to support companies to continuously improve and gain competitive advantage. However, previous benchmarking management suffered from subjectivity and one-sidedness, and the benchmarking evaluation was only to identify the benchmark and its own deficiencies, with a high degree of independence between the sub-processes. The framework is the first to jointly apply topic mining and relation extraction techniques to establish an objective index system and mine the mechanistic relations among the indexes, and finally provides a concise and interactive display of the results based on knowledge graph visualization. In this study, 50 oil and gas companies are taken as examples to carry out the case study of sustainable development benchmarking management, and a sustainable development program with strong logic and systematicity is proposed at both macro and micro levels. This study results verify that the framework can effectively improve the scientific, comprehensive and systematic nature of sustainable development benchmarking management. At the same time, the framework extends the role of the index system so that it can provide the necessary mechanistic guidance for subsequent sustainable development programmatic to achieve programmatic with a high degree of synergy. In addition, the framework has a high degree of coherence with the United Nations Sustainable Development Goals (SDGs), and the proposals put forward can not only be used to improve the sustainable development capability of companies themselves, but also contribute to the realization of the SDGs. [Display omitted] • A novel sustainable development benchmarking framework for energy companies. • A first joint application of topic mining and relation extraction for benchmarking. • Energy companies should promote localization and diversification of the workforce. • Energy companies should expand joint venture projects with developing country. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Deconstructing the organizational resilience of construction firms in major emergencies: A text mining analysis of listed construction companies in China.
- Author
-
Zhang, Yuguo, Wang, Wenshun, Mi, Lingyun, Liu, Ying, Qiao, Lijie, Ni, Guodong, and Wang, Xiangyang
- Abstract
The COVID-19 pandemic has resulted in unprecedented huge losses for construction companies in terms of capital, labor, and project construction, highlighting a significant lack of organizational resilience (OR) within the construction industry. How to improve the OR of construction companies has become the key to resolve the crisis. However, there is a lack of systematic insights into the structure and dimensions of OR, as well as a gap in empirical evidence to explain how construction firms systematically construct OR. Therefore, this paper systematically identifies 19 resilience topics and their language descriptions by mining the resilience-related information in 1572 annual reports and expert interview data of listed companies in the Chinese construction industry during the COVID-19 pandemic, using a combination of the topic model and language model. Following the basic concept of OR, a framework of OR dimensions in the construction firms that integrates actions, resources, and capabilities is developed to uncover the complex resilience characteristics of construction firms. The results show that OR sought by listed companies in the construction industry consists of resilient actions, resilience resources, and resilience capabilities. Resilient actions stem from motivating, restraining, protecting, and exploring actions. The resilience resources include the resource reserves of organization, technology, and knowledge, while the resilience capabilities are dynamic capabilities that integrate prevention, response, adaptation, monitoring, perception, and recovery. The findings not only deconstruct the OR framework for construction companies to cope with crises, but also provide new paths for construction managers to cultivate the OR of companies in practice. • The organizational resilience of construction firms is systematically deconstructed. • Text mining was applied to explore resilience information of listed construction firms. • The language model is embedded in the topic model to identify resilient topics. • Primary dimensions are resilient actions, resilient resources, and resilient capabilities. • Organizational resilience of construction firms is composed of 18 sub-dimensions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Automatic topic labeling using graph-based pre-trained neural embedding.
- Author
-
He, Dongbin, Ren, Yanzhao, Mateen Khattak, Abdul, Liu, Xinliang, Tao, Sha, and Gao, Wanlin
- Subjects
- *
REDUNDANCY in engineering , *FOOD labeling , *TASKS - Abstract
It is necessary to reduce the cognitive overhead of interpreting the native topic term list of the Latent Dirichlet Allocation (LDA) style topic model. In this regard, automatic topic labeling has become an effective approach to generate meaningful alternative representations of topics discovered for end-users. In this study, we introduced a novel two-phase neural embedding framework with the redundancy-aware graph-based ranking process. It demonstrated how pre-trained neural embedding could be usefully applied in topic terms, sentence presentations, and automatic topic labeling tasks. Moreover, reranking the topic terms optimized the discovered topics with fewer yet more representative terms while retaining the topic information integrality and fidelity. It further decreased the burden of computation caused by neural embedding and improved the overall effectiveness of the labeling system. Compared with the prevailing state-of-the-art and classical labeling systems, our efficient model boosted the quality of the topic labels generated and discovered more meaningful topic labels. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
10. A Latent Topic Analysis Framework for Category-Level Target Promotion in the Supermarket.
- Author
-
Sun, Yi, Hayashi, Teruaki, and Ohsawa, Yukio
- Subjects
SUPERMARKETS ,CONSUMER behavior ,POINT-of-sale systems ,ENTROPY (Information theory) - Abstract
When and which products to recommend to whom has been the essential issue for retailers. In this field, the topic model is attracting researchers' attention for extracting customers' purchase behavior instead of association rules or K-means. However, the optimal number of topics is chosen manually, and there are some limitations to use topic models. In this study, we developed the model by Koltcov et al. for point of sales (POS) data in the supermarket. To grasp the change of topics over time, we divided five-month POS data into ten datasets into two-week intervals and applied the topic model with Renyi entropy separately. The results suggest that splitting data might be a better way to understand customer's behavior. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
11. A Web service clustering method based on topic enhanced Gibbs sampling algorithm for the Dirichlet Multinomial Mixture model and service collaboration graph.
- Author
-
Hu, Qiang, Shen, Jiaji, Wang, Kun, Du, Junwei, and Du, Yuyue
- Subjects
- *
GIBBS sampling , *WEB services , *DOCUMENT clustering , *ALGORITHMS , *QUALITY of service , *FEATURE extraction - Abstract
A method to enhance Web service clustering is proposed in this paper. Since current service clustering methods usually face low quality of service representation vectors and lack consideration of service collaboration, we try to provide an improved topic model to generate high-quality service representation vectors and design a service clustering method to integrate function similarity and collaboration similarity. First, by introducing feature word extraction and probability distribution correction into GSDMM, we present a model called TE-GSDMM (topic enhanced Gibbs sampling algorithm for the Dirichlet Multinomial Mixture model). Then, a service collaboration graph is put forward to model cooperation relationships and generate service collaboration vectors. Collaboration similarity is assessed by the similarity of service collaboration vectors. Finally, the K-means++ algorithm is employed to cluster Web services by evaluating service function similarity and collaboration similarity. Experiments show that TE-GSDMM outperforms other topic models in generating high-quality service representation vectors for service clustering. Moreover, service clustering performance is further improved by integrating collaboration similarity. Thus, the proposed method effectively enhances Web service clustering by improving the quality of service representation vectors and integrating service collaboration similarity. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. Topic model for long document extractive summarization with sentence-level features and dynamic memory unit.
- Author
-
Han, Chunlong, Feng, Jianzhou, and Qi, Haotian
- Subjects
- *
TEXT summarization , *LANGUAGE models , *CHATGPT , *MEMORY - Abstract
The Transformer-based summarization models rely solely on the attention mechanism for document encoding, making it difficult to accurately capture long-range dependencies in long documents due to the presence of attention redundancy. To address this issue, we propose an extractive summarization framework guided by a topic model (TopicSum) that utilizes a heterogeneous graph neural network to leverage the topic information as document-level features during the sentence selection process, thereby capturing the long-range dependencies among sentences. The sentence-level features in this topic model align with the basic unit of the extractive summarization task. Additionally, a memory mechanism is employed to dynamically store and update the memory module, reducing the potential of repetitive information guiding sentence selection. We evaluated the model on three large document datasets, namely Pubmed, arXiv, and GovReport, and achieved significantly higher Rouge scores than previous works, including extractive and abstractive models. Furthermore, our experiments demonstrate that recent highly regarded large language models such as ChatGPT are insufficient to handle the long document summarization task directly. The proposed approach in this paper exhibits sufficient competitiveness in terms of both generation quality and deployment conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Android malware detection method based on graph attention networks and deep fusion of multimodal features.
- Author
-
Chen, Shaojie, Lang, Bo, Liu, Hongyu, Chen, Yikai, and Song, Yucai
- Subjects
- *
MALWARE , *MULTIMODAL user interfaces , *SOURCE code - Abstract
Currently, Android malware detection methods always focus on one kind of app feature, such as structural, semantic, or other statistical features. This paper proposes a novel Android malware detection method that integrates multiple features of Android applications. First, to effectively extract the structural and semantic features, we propose a new type of call graph named the class-set call graph (CSCG) that uses the sets of Java classes as nodes and the call relationships between class sets as edges, and we design a dynamic adaptive CSCG construction method that can automatically determine the node size for applications with different scales. The topic model is used to mine the source code semantics from the class sets as the node features. Then, we use a graph attention network (GAT) with max pooling to extract the CSCG feature that encompass both the semantic and structural features of the Android application. Furthermore, we construct a deep multimodal feature fusion network to fuse the CSCG features with permission features. Experimental results show that our method achieves a detection accuracy of 97.28%–99.54% on the three constructed datasets, which is better than the existing methods. • Proposing multimodal features deep fusion method for Android malware detection. • Designing class-set call graph to integrates structural and semantic features. • Adaptive class merging method for class-set call graph construction. • Introducing GAT and max pooling to extract salient graph features. • Proposing fusion network for graph features and permission features. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. A thousand words tell more than just numbers: Financial crises and historical headlines.
- Author
-
Ristolainen, Kim, Roukka, Tomi, and Nyberg, Henri
- Abstract
We show that financial crises are preceded by changes in specific types of narrative information contained in newspaper article titles. Our novel international dataset and the resulting empirical evidence are gathered by integrating information from a large panel of economic news articles in global newspapers between the years 1870 and 2016 with conventional macroeconomic and financial indicators. We find that the predictive information of newspaper article titles that signals coming crisis episodes is substantial over and above the macroeconomic and financial indicators. Feature contribution analysis and crisis case studies reveal that the new indicators capture more detailed, but still generalizable information on the buildup of crises. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Author topic model for co-occurring normal documents and short texts to explore individual user preferences.
- Author
-
Yang, Yang and Wang, Feifei
- Subjects
- *
DISTRIBUTION (Probability theory) , *AUTHORSHIP collaboration , *AUTHORSHIP , *AUTHORS - Abstract
The investigation of user preferences through user comments has attracted significant attention. Although topic models have been verified as useful tools to facilitate the understanding of textual contents, they cannot be directly applied to accomplish this task because of two problems. The first problem is the severe data sparsity suffered by user comments because they are generally short. The second problem is the mixture of opinions from both user comments and the original documents the users commented on. To simultaneously solve the data sparsity problem and explore clean user preferences, we propose an author co-occurring topic model (AOTM) for normal documents and their short user comments. By considering authorship, AOTM allows each author of short texts to have a probability distribution over a set of topics represented only short texts. Accordingly, the individual user preferences can be investigated based on these author-level distributions. We verify the performance of AOTM using two news article datasets and one e-commerce dataset. Extensive experiments demonstrate that the AOTM outperforms several state-of-the-art methods in topic learning and topic representation of documents. The potential usage of AOTM in exploring individual user preferences is further illustrated by drawing user portraits and predicting user posting behaviors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
16. A user-based aggregation topic model for understanding user's preference and intention in social network.
- Author
-
Shi, Lei, Song, Guangjia, Cheng, Gang, and Liu, Xia
- Subjects
- *
SOCIAL networks , *RECURRENT neural networks , *VIRTUAL communities , *GIBBS sampling , *INTENTION - Abstract
In this study, we focus on understanding and mining user's preferences and intentions via user-based aggregation in the context of a social network. Understanding preference and intention in microblog texts is more difficult and challenging than understanding such characteristics in the context of standard text. The main reason is that search history and click history are difficult to obtain due to data privacy in social networks. Meanwhile, the text is sparse, and the number of background topics in social networks is enormous. To overcome the above challenges, we explore an indirect method of user's preference and intention understanding by leveraging a user-based aggregation topic model (UATM). Our UATM aims to mine the distributions of user's preferences and intentions by utilizing user's preference and intention distributions and followees' preference and intention distributions. Furthermore, to alleviate the sparsity problem, we discriminatively model common words and topic words and incorporate a user factor into our model. We combine the recurrent neural network (RNN) and inverse document frequency (IDF) as the weight prior to learn word relationships. Moreover, to further weaken the sparsity of context, we leverage word pairs to model topics for all documents. We also propose a collapsed Gibbs sampling algorithm to infer preference and intention in our UATM. To verify the effectiveness of the proposed method, we collect a Sina Weibo dataset consisting of microblog users and their pushed content to conduct various experiments. Both qualitative and quantitative evaluations demonstrate that our proposed UATM model outperforms several state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
17. CTEA: Context and Topic Enhanced Entity Alignment for knowledge graphs.
- Author
-
Yan, Zhihuan, Peng, Rong, Wang, Yaqian, and Li, Weidong
- Subjects
- *
CONVOLUTIONAL neural networks , *VECTOR spaces - Abstract
We study the problem of finding entities referring to the same real world object in multilingual knowledge graphs(KGs), i.e., entity alignment for multilingual KGs. Recently, embedding-based entity alignment methods get extended attention in this area. Most of them firstly embed the entities in low dimensional vectors space via relation structure of entities, and then align entities via these learned embeddings combined with some entity similarity function. Even achieved promising performances, these methods are defective in utilizing entity contexts and entity topic information. In this paper, we propose a novel entity alignment framework CTEA (C ontext and T opic Enhanced E ntity A lignment), which integrates entity context information and entity topic information to help alignment. This framework learns entity topic distributions from their attributes with a specially designed topic model BTM4EA, and the learned entity topic distributions are used to filter some weakly correlated entities for each entity to be aligned. Meanwhile, we embed KGs to low dimensional vectors space via translation-based KG embedding model and mine context information from these vectors with an attention attached Convolutional Neural Network(CNN). The entity embeddings, entity contexts and entity topics are combined to get the final alignment results. Extended experiments reveal that our method achieves promising performances in most cases. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
18. ARTAN: Align reviews with topics in attention network for rating prediction.
- Author
-
Liang, Yile, Qian, Tieyun, and Yu, Huilin
- Subjects
- *
FORECASTING , *MATRIX decomposition , *MACHINE learning , *RECOMMENDER systems - Abstract
Rating prediction has long been a popular research topic in the field of recommendation systems. Latent factor models, particularly matrix factorization (MF) methods, constitute the most prevalent techniques for rating prediction. However, MF-based methods suffer from the problems of data sparsity and lack of explicability. In this paper, we propose a novel model in which ratings and topic-level review information are integrated into a deep neural framework to address these problems. Specifically, we designed a topic alignment operation and a topic attention mechanism to reflect the user's preferences and an item's properties in terms of a topic in the reviews. We present a neural prediction layer that extends user and item representations by including both the latent factors from ratings and the textual information from reviews. The results of extensive experiments on four real-world datasets demonstrate that our proposed method consistently outperforms the state-of-the-art recommendation approaches that follow this direction in terms of rating prediction. Furthermore, our model can categorize representative reviews of users/items and group reviews into topics for the users' reference. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
19. Understanding user's travel behavior and city region functions from station-free shared bike usage data.
- Author
-
Chang, Ximing, Wu, Jianjun, He, Zhengbing, Li, Daqing, Sun, Huijun, and Wang, Weiping
- Subjects
- *
DISTRIBUTION (Probability theory) , *CYCLING , *DATA mining , *ALGORITHMS , *TRAVEL - Abstract
• Spatiotemporal usage patterns of station-free shared bikes are explored. • A topic-based two-stage algorithm is proposed to discover city functional regions. • Region functions are labeled by static POI data and dynamic mobility patterns. Station-free shared bike (SFSB) is a new travel mode that shared bikes are allowed to park in any proper places. It implies that the users are more likely to park the SFSB as close as their destinations. This advantage makes the SFSB data to be an ideal source to understand the land-use distribution. Inspired by the idea in text mining, this paper proposes a topic-based two-stage SFSB data mining algorithm to understand the SFSB user's travel behavior and to discover city functional regions. A region, a function and human mobility patterns are treated as a document, a topic and words, respectively. Then, a region is represented by a distribution of functions, and a function is featured by a distribution of mobility patterns. The point-of-interest data is combined to annotate the clustered regions to discover the real functions. At last, the proposed method is tested using 14-day SFSB data in Beijing and the results are estimated by the Satellite Map data. The proposed methods and the results can be applied to infer the individual's travel purpose through functional regions and to improve land-use planning, etc. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
20. A Study of a Method to Understand the Intention of Taste Expressions through Text Mining.
- Author
-
Tachibana, Shinichi and Tsuda, Kazuhiko
- Subjects
TASTE ,INTENTION ,WINE tasting ,WEBSITES - Abstract
The purpose of this study is to evidence a method of understanding the intentions of taste expressions from word-of-mouth data of cooking recipe websites using text mining. This study aims to clarify the use of the word "KOKU" as an example to verify the method. KOKU is one of the taste expressions used like richness experienced in various dishes such as in the taste of wine. In order to clarify the relationship between the features of KOKU and cooking categories, they were clustered using the latent Dirichlet allocation. The categories were classified into groups of foods using similar ingredients, sweetness, oils, and seasonings. Through the analysis mentioned above, the features of KOKU were defined. In the past, there has been no attempt to clarify the features of KOKU using word-of-mouth data from cooking recipe websites. The success in defining "KOKU" is evidence that this method has potential to be extended and applied to expressions other than KOKU. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
21. Optimising topic coherence with Weighted P[formula omitted]lya Urn scheme.
- Author
-
Wang, Rui, Zhou, Deyu, and He, Yulan
- Subjects
- *
URNS , *PRIOR learning , *SENTIMENT analysis , *LATENT semantic analysis , *GRAPHICS processing units - Abstract
Topic models have been widely used to mine hidden topics from documents. However, one limitation of such topic models is that they are prone to generate incoherent topics. To address this limitation, many approaches have been proposed to incorporate the prior knowledge of word semantic relatedness into the topic inference process. One example is the Generalized P o ´ lya Urn (GPU) scheme. However, GPU-based topic models often require sophisticated algorithms to acquire domain-specific knowledge from data. Moreover, prior knowledge is incorporated into the topic inference process without considering its impact on the intermediate topic sampling results. In this paper, we propose a novel Weighted P o ´ lya Urn scheme and incorporate it into Latent Dirichlet Allocation framework to build the self-enhancement topic model and generate coherent topics. In specific, semantic prior knowledge based on word embedding is employed to measure the semantic coherence of a word to different topics, which is incorporated into the Weighted P o ´ lya Urn scheme. Moreover, semantic coherence is updated dynamically based on the semantic similarity between a word and the representative words in different topics. Experiments have been conducted on seven public corpora from different domains to evaluate the effectiveness of the proposed approach. Experimental results show that compared to the state-of-the-art baselines, the proposed approach can generate more coherent topics. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
22. Generation of topic evolution graphs from short text streams.
- Author
-
Gao, Wang, Peng, Min, Wang, Hua, Zhang, Yanchun, Han, Weiguang, Hu, Gang, and Xie, Qianqian
- Subjects
- *
NATURAL language processing , *BIOLOGICAL evolution , *RANDOM fields , *RIVERS - Abstract
Topic evolution mining on short texts is an important research topic in natural language processing. Existing methods have been focused either on the topic evolution of normal documents or on the evolution of topics along a timeline. In this paper, we aim to generate topic evolutionary graphs from short texts, which not only capture the main topic timeline, but also reveal the correlations between related subtopics. Firstly, we propose an Encoder-only Transformer Language Model (ETLM) to quantify the relationship between words. Then we propose a novel topic model, referred as weighted Conditional random field regularized Correlated Topic Model (CCTM), which leverages semantic correlations to discover meaningful topics and topic correlations. Finally, topic evolutionary graphs are generated by an Online version of CCTM (OCCTM) to capture the evolutionary patterns of main topics and related subtopics. Experimental results on real-world datasets demonstrate our method outperforms baselines on quality of topics and presents motivated patterns for topic evolution mining. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
23. HGGC: A hybrid group recommendation model considering group cohesion.
- Author
-
Jeong, Hyun Ji and Kim, Myoung Ho
- Subjects
- *
COHESION , *SOCIAL cohesion , *GROUPS - Abstract
• Group recommendation model GGC considering group cohesion is proposed. • Applying the group cohesion in group recommendation is the first attempt. • To alleviate the data sparsity problem, HGGC integrates content information to GGC. • Experiments are performed for three datasets, Meetup CA, Meetup NYC and Gowalla. • Hit ratio (HR@10) of HGGC are increased by 220%, 85% and 10% over previous methods. In the real world, people organize and participate in many group activities. Group recommendation that finds out preferable items for a group of users is a challenging problem since it is difficult to properly aggregate diverse preferences among the members. Group cohesion denotes proportion of group members who actively participate in various activities relevant to group's objectives. In a strongly cohesive group, most members actively participate in group decision, while many of group members in a weakly cohesive group are bystanders who just follow the other member's decision. Since group cohesion is an important factor in group recommendation, the concept of group cohesion needs to be properly reflected to the recommendation model. We present a new approach about group recommendation that appropriately captures group cohesion. A hybrid model that considers content information and rating data simultaneously is also proposed to alleviate a well-known data sparsity problem in group recommendation. We use a heterogeneous information network (HIN) that can contain rich information about entities and relationships from which additional content information can be extracted. Then, this information is properly incorporated to the group recommendation model. Experimental results on the real datasets show that our proposed method outperforms the existing state-of-the-art methods and improves recommendation performances significantly. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
24. The effects of carbon-related news on carbon emissions and carbon transfer from a global perspective: Evidence from an extended STIRPAT model.
- Author
-
Zhou, Wenwen, Cao, Ximeng, Dong, Xuefan, and Zhen, Xuan
- Subjects
- *
CARBON emissions , *GOVERNMENT policy on climate change , *GREENHOUSE gas mitigation , *CARBON in soils , *WAGE increases - Abstract
Climate change poses an enormous challenge to countries across the world. News is a crucial channel to draw society's attention to this issue and strengthen climate governance initiatives, such as carbon emission reduction. The aim of this research was to investigate the effects of carbon-related news on carbon emissions and carbon transfer. First, we analyzed the topics of carbon-related news from media outlets such as ABC , CNN , and The Guardian from 2009 to 2019. Then, we calculated the carbon emissions from both production- and consumption-based perspectives and constructed the carbon transfer network of major countries around the world. Finally, we examined the direct and lag effects of carbon-related news attention and contents on carbon emissions and carbon transfer by using the extended population, affluence, and technology (STIRPAT) model as the analytical framework. The results showed the following. First, countries are paying increasing attention to carbon-related issues with the rise of global carbon emissions, and they mainly focus on the social aspects of carbon emissions. Second, there are differences in production- and consumption-based carbon emissions by country. The amount of embodied carbon transfer between countries has increased over time and the carbon transfer network has become tighter. Finally, carbon-related news has a positive impact on improving carbon emissions and carbon transfer worldwide. Different news topics have also shown varying effects on carbon emissions and carbon transfer. In summary, this study further revealed the impacts of carbon-related news on carbon emissions and transfer and accordingly provided policy implications in carbon emission reduction for the government. • The evolution of carbon-related news topics are analyzed. • The effects of carbon-related news on carbon emissions and carbon transfer are investigated. • Carbon-related news has a positive impact on improving carbon emissions and carbon transfer worldwide. • Different news topics have led to different effects on carbon emissions and carbon transfer. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. Three-way decisions based blocking reduction models in hierarchical classification.
- Author
-
Shen, Wen, Wei, Zhihua, Li, Qianwen, Zhang, Hongyun, and Miao, Duoqian
- Subjects
- *
ACCURACY of information , *CLASSIFICATION , *HIERARCHIES , *INFORMATION storage & retrieval systems , *OBJECT manipulation - Abstract
• Two three-way decisions based hierarchical classification models are proposed. • The ambiguity of hierarchy is noticed and 3WD is used to reduce the uncertainty. • The topic model is used to learn category relations. • The proposed models perform better than several hierarchical classification methods. • Extend the application domain of three-way decisions to fashion image classification. Hierarchical classification (HC) is effective when categories are organized hierarchically. However, the blocking problem makes the effect of hierarchical classification greatly reduced. Blocking means that samples are easily getting misclassified in high-level classifiers so that the samples are blocked at the high-level of the hierarchy. This issue is caused by the inconsistency between the artificially defined hierarchy and the actual hierarchy of the raw data. Another issue is that it is flippant to strictly process data following the hierarchy. Therefore, special treatment is required for some uncertain data. To address the first issue, we learn category relationships and modify the hierarchy. To address the second issue, we introduce three-way decisions (3WD) to targetedly deal with the ambiguous data. We extend original studies and propose two HC models based on 3WD, collectively referred to as TriHC, for carefully modifying the hierarchy to alleviate the blocking problem. The proposed TriHC model learns new category hierarchies by the following three steps: (1) mining category relations; (2) modifying category hierarchies according to the latent category relations; and (3) using 3WD to divide observed objects into three regions: positive region, boundary region, and negative region, and making decisions based on different strategies. Specifically, based on different category relation mining methods, there are two versions of TriHC, cross-level blocking priori knowledge based TriHC (CLPK-TriHC) and expert classifier based TriHC (EC-TriHC). The CLPK-TriHC model defines a cross-level blocking distribution matrix to mine the category relations between the higher and lower levels. To better exploit category hierarchical relations, the EC-TriHC model builds expert classifiers using topic model to learn latent category topics. Experimental results validate that the proposed methods can simultaneously reduce the blocking and improve the classification accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
26. A nonparametric model for online topic discovery with word embeddings.
- Author
-
Chen, Junyang, Gong, Zhiguo, and Liu, Weiwen
- Subjects
- *
NONPARAMETRIC estimation , *SOCIAL media , *DOCUMENT clustering , *VOCABULARY , *GIBBS sampling - Abstract
With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a n on p ara m etric m odel (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a "spike and slab" function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
27. A topic enhanced approach to detecting multiple standpoints in web texts.
- Author
-
Lin, Junjie, Kong, Qingchao, Mao, Wenji, and Wang, Lei
- Subjects
- *
INFORMATION technology security , *MARKETING management , *INFORMATION policy , *INDUSTRIAL management , *EMERGENCY management - Abstract
Internet has become the most popular platform for people to exchange opinions and express stances. The stances implied in web texts indicate people's fundamental beliefs and viewpoints. Understanding the stances people take is beneficial and critical for many security and business related applications, such as policy design, emergency response and marketing management. Most previous work on stance detection focuses on identifying the supportive or unsupportive attitudes towards a specific target. However, another important type of stance detection, i.e. multiple standpoint detection, has been largely ignored. Multiple standpoint detection aims to identify the distinct standpoints people hold among multiple parties, which reflects people's intrinsic values and judgments. When expressing standpoints, people tend to discuss diverse topics, and the word usage in the topics of different standpoints often varies a lot. As topics can provide latent information for identifying various standpoints, in this paper, we propose a topic-based approach to detecting multiple standpoints in Web texts, by enhancing generative classification model as well as feature representation of texts. In addition, we develop an adaptive process to determine parameter values in our approach automatically. Experimental studies on several real-world datasets verify the effectiveness of our proposed approach in detecting multiple standpoints. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
28. ITWF: A framework to apply term weighting schemes in topic model.
- Author
-
Yang, Kai, Cai, Yi, Leung, Ho-fung, Lau, Raymond Y.K., and Li, Qing
- Subjects
- *
LATENT semantic analysis , *TERMS & phrases , *STATISTICAL models , *KNOWLEDGE acquisition (Expert systems) - Abstract
Topic models like Latent Dirichlet Allocation (LDA) and its variants is a type of statistical model for discovering latent topics. However, as revealed by the previous research, some topics generated by LDA may be uninterpretable and semantically incoherent due to the occurrence of irrelevant words in these topics. To improve the semantic qualities of automatically discovered topics, we explore the distributional characteristics of words across topics to identify topic-indiscriminate words which are blamed for the low-quality topics. The main contribution of our research reported in this paper is that we develop a novel framework named Iterative Term Weighting Framework (ITWF) which can effectively identify and filter out topic-indiscriminate words from uncovered topics. In particular, the proposed framework first applies an entropy-based term weighting schemes and adopts a novel iterative method to identify topic-indiscriminate words. To the best of our knowledge, our research is among the very few successful work that aims to enhance both the semantic coherence and the interpretability of LDA-based topic modeling methods. The experimental results show that the proposed framework improves the effectiveness of LDA as well as its variants. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
29. Cooperative hierarchical Dirichlet processes: Superposition vs. maximization.
- Author
-
Xuan, Junyu, Lu, Jie, and Zhang, Guangquan
- Subjects
- *
MACHINE learning , *NONPARAMETRIC statistics , *STOCHASTIC processes , *HIERARCHICAL Bayes model , *DATA structures - Abstract
The cooperative hierarchical structure is a common and significant data structure observed in, or adopted by, many research areas, such as: text mining (author–paper–word) and multi-label classification (label–instance–feature). Renowned Bayesian approaches for cooperative hierarchical structure modeling are mostly based on hierarchical Bayesian models. However, these approaches suffer from a serious issue in that the number of hidden topics/factors needs to be fixed in advance and an inappropriate number may lead to overfitting or underfitting. One elegant way to resolve this issue is Bayesian nonparametric learning, but existing work in this area still cannot be applied to cooperative hierarchical structure modeling. In this paper, we propose a cooperative hierarchical Dirichlet process (CHDP) to fill this gap. Each node in a cooperative hierarchical structure is assigned a Dirichlet process to model its weights on the infinite hidden factors/topics. Together with measure inheritance from hierarchical Dirichlet process, two kinds of measure cooperation, i.e., superposition and maximization, are defined to capture the many-to-many relationships in the cooperative hierarchical structure. Furthermore, two constructive representations for CHDP, i.e., stick-breaking and international restaurant process, are designed to facilitate the model inference. Experiments on synthetic and real-world data with cooperative hierarchical structures demonstrate the properties and the ability of CHDP for cooperative hierarchical structure modeling and its potential for practical application scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
30. Multi-modal multi-view Bayesian semantic embedding for community question answering.
- Author
-
Sang, Lei, Xu, Min, Qian, ShengSheng, and Wu, Xindong
- Subjects
- *
QUESTION answering systems , *BAYESIAN analysis , *GAUSSIAN mixture models , *SOCIAL structure , *EMBEDDING theorems - Abstract
Abstract Semantic embedding has demonstrated its value in latent representation learning of data, and can be effectively adopted for many applications. However, it is difficult to propose a joint learning framework for semantic embedding in Community Question Answer (CQA), because CQA data have multi-view and sparse properties. In this paper, we propose a generic Multi-modal Multi-view Semantic Embedding (MMSE) framework via a Bayesian model for question answering. Compared with existing semantic learning methods, the proposed model mainly has two advantages: (1) To deal with the multi-view property, we utilize the Gaussian topic model to learn semantic embedding from both local view and global view. (2) To deal with the sparse property of question answer pairs in CQA, social structure information is incorporated to enhance the quality of general text content semantic embedding from other answers by using the shared topic distribution to model the relationship between these two modalities (user relationship and text content). We evaluate our model for question answering and expert finding task, and the experimental results on two real-world datasets show the effectiveness of our MMSE model for semantic embedding learning. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
31. Variational-based latent generalized Dirichlet allocation model in the collapsed space and applications.
- Author
-
Ihou, Koffi Eddy and Bouguila, Nizar
- Subjects
- *
HUMAN facial recognition software , *DIRICHLET principle , *THREE-dimensional imaging , *FACIAL expression , *BAYESIAN analysis - Abstract
Abstract In topic modeling framework, many Dirichlet-based models performances have been hindered by the limitations of the conjugate prior. It led to models with more flexible priors, such as the generalized Dirichlet distribution, that tend to capture semantic relationships between topics (topic correlation). Now these extensions also suffer from incomplete generative processes that complicate performances in traditional inferences such as VB (Variational Bayes) and CGS (Collaspsed Gibbs Sampling). As a result, the new approach, the CVB-LGDA (Collapsed Variational Bayesian inference for the Latent Generalized Dirichlet Allocation) presents a scheme that integrates a complete generative process to a robust inference technique for topic correlation and codebook analysis. Its performance in image classification, facial expression recognition, 3D objects categorization, and action recognition in videos shows its merits. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
32. Identifying spatial interaction patterns of vehicle movements on urban road networks by topic modelling.
- Author
-
Liu, Kang, Gao, Song, and Lu, Feng
- Subjects
- *
EXAMPLE - Abstract
Abstract The development of mobile positioning technologies makes massive individual trajectory data easily accessible, which facilitates the revisit of spatial interaction issue in recent years. Researchers have proposed many methods to investigate the spatial interactions derived from human movements, such as the gravity model and radiation model. However, these studies have mainly focused on the interactions among areal units at an aggregated level, neglecting that in most cases, human movements are carried by vehicles and constrained by the underlying road network, which causes the interactions among roads. To fill this gap, we propose a novel approach to identify spatial interaction patterns of vehicle movements on urban road network. As the topic model originating from the domain of natural language processing has powerful advantages in extracting semantic relations of words from corpus, we utilize it to extract interaction relations of urban roads from massive vehicle trajectories. First, "strokes" (i.e., natural streets) are chosen as geographical units to represent the vehicle moving paths. Then, an analogy between geographical elements (i.e., stroke, moving path) and textual elements (i.e., word, document) is established, and a topic model is applied to the moving paths to identify the spatial interaction patterns on road network. From a mass of trajectory data collected by GNSS-equipped taxis in Beijing, the "topic patterns", which can be viewed as clusters of spatially interacted strokes, are identified, visualized and verified. It is argued that our proposed approach is effective in identifying spatial interaction patterns, which provides a new perspective for spatial interaction modelling and enriches the current spatial interaction studies. Highlights • We originally proposed to investigate the spatial interactions among linear streets instead of areal regions. • We proposed a method to represent vehicle moving paths, which can well reflect human driving behavior on road network. • We proposed an innovative approach to identify spatial interaction patterns on road network by topic modelling. • Our approach can better reveal the spatial interaction patterns compared to the community detection-based method. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
33. Topical Co-Attention Networks for hashtag recommendation on microblogs.
- Author
-
Li, Yang, Liu, Ting, Hu, Jingwen, and Jiang, Jing
- Subjects
- *
MICROBLOGS , *TAGS (Metadata) , *COMPUTER networks , *NATURAL languages , *SHORT-term memory - Abstract
Abstract Hashtags provide a simple and natural way of organizing content in microblog services. Along with the fast growing of microblog services, the task of recommending hashtags for microblogs has been given increasing attention in recent years. However, much of the research depends on hand-crafted features. Motivated by the successful use of neural models for many natural language processing tasks, in this paper, we adopt an attention based neural network to learn the representation of a microblog post. Unlike previous works, which only focus on content attention of microblogs, we propose a novel Topical Co-Attention Network (TCAN) that jointly models content attention and topic attention simultaneously, in the sense that the content representation(s) are used to guide the topic attention and the topic representation is used to guide content attention. We conduct experiments and test with different settings of TCAN on a large real-world dataset. Experimental results show that our model significantly outperforms various competitive baseline methods. Furthermore, the incorporation of topical co-attention mechanism gives more than 13.6% improvement in F1 score compared with the standard LSTM based methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
34. Hidden topic–emotion transition model for multi-level social emotion detection.
- Author
-
Tang, Donglei, Zhang, Zhikai, He, Yulan, Lin, Chao, and Zhou, Deyu
- Subjects
- *
EMOTION recognition , *LATENT variables , *BAYESIAN analysis , *MARKOV processes , *SENTIMENT analysis - Abstract
Abstract With the fast development of online social platforms, social emotion detection, focusing on predicting readers' emotions evoked by news articles, has been intensively investigated. Considering emotions as latent variables, various probabilistic graphical models have been proposed for emotion detection. However, the bag-of-words assumption prohibits those models from capturing the inter-relations between sentences in a document. Moreover, existing models can only detect emotions at either the document-level or the sentence-level. In this paper, we propose an effective Bayesian model, called hidden Topic–Emotion Transition model, by assuming that words in the same sentence share the same emotion and topic and modeling the emotions and topics in successive sentences as a Markov chain. By doing so, not only the document-level emotion but also the sentence-level emotion can be detected simultaneously. Experimental results on the two public corpora show that the proposed model outperforms state-of-the-art approaches on both document-level and sentence-level emotion detection. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
35. Recognizing driving styles based on topic models.
- Author
-
Qi, Geqi, Wu, Jianping, Zhou, Yang, Du, Yiman, Jia, Yuhan, Hounsell, Nick, and Stanton, Neville A.
- Subjects
- *
MACHINE learning , *INFORMATION needs , *ACQUISITION of data - Abstract
Highlights • The effective and targeted interaction between driving environment and drivers. • Using topic models, the underlying structure of driving styles is established. • Two topic models (mLDA and mHLDA) are presented for driving style analysis. • The driving styles can be recognized properly using the proposed models. Abstract With the explosion of information in our current era, senders of information increasingly need to target their messages to recipients. However, messages within transportation systems, including traffic information and commercial advertisements, tend to be transmitted to all drivers indiscriminately. This is because the information providers (such as other vehicles, roads, facilities, buildings etc.), can hardly recognize the variations within drivers, who should be treated differently as information recipients. As a result of the rapid development of data collection technologies and machine learning techniques in recent years, extraction and recognition of drivers' unique driving style from actual driving behaviour data become possible. In this paper, two kinds of topic models are investigated: mLDA and mHLDA, to discover distinguishable driving style information with hidden structure from the real-world driving behaviour data. The results show that the proposed models can successfully recognize the differences between driving styles. The study is of great value for providing deep insight into the underlying structure of driving styles and can effectively support the recognition of drivers with different driving styles. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
36. Topic-enhanced emotional conversation generation with attention mechanism.
- Author
-
Peng, Yehong, Fang, Yizhen, Xie, Zhiwen, and Zhou, Guangyou
- Subjects
- *
DIRICHLET problem , *BOUNDARY value problems , *EMOTIONAL intelligence , *HUMAN-computer interaction , *EMOTIONS - Abstract
Abstract Emotional conversation generation has elicited a wide interest in both academia and industry. However, existing emotional neural conversation systems tend to ignore the necessity to combine topic and emotion in generating responses, possibly leading to a decline in the quality of responses. This paper proposes a topic-enhanced emotional conversation generation model that incorporates emotional factors and topic information into the conversation system, by using two mechanisms. First, we use a Twitter latent Dirichlet allocation (LDA) model to obtain topic words of the input sequences as extra prior information, ensuring the consistency of content between posts and responses for emotional conversation generation. Second, the system uses a dynamic emotional attention mechanism to adaptively acquire content-related and affective information of the input texts and extra topics. The advantage of this study lies in the fact that the presented model can generate abundant emotional responses, with the contents being related and diverse. To demonstrate the effectiveness of our method, we conduct extensive experiments on large-scale Weibo post–response pairs. Experimental results show that our method achieves good performance, even outperforming some existing models. Highlights • We present a topic-enhanced neural emotion conversation generation model (TE-ECG) with attention mechanism. • The topic words are obtained from a pre-trained Twitter LDA model to ensure the generated response is related to the post. • A novel dynamic emotional attention mechanism is proposed to capture the emotional context and topic information. • The TE-ECG model can generate responses at both the emotion- and content-related levels. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
37. Universal affective model for Readers’ emotion classification over short texts.
- Author
-
Liang, Weiming, Xie, Haoran, Rao, Yanghui, Lau, Raymond Y.K., and Wang, Fu Lee
- Subjects
- *
CLASSIFICATION algorithms , *WEB 2.0 , *MICROBLOGS , *EMOTIONS , *BIG data - Abstract
Highlights • A novel universal affective model for classifying social emotions is proposed. • ATF-IDF is developed to enhance the semantic relationships between biterms. • A word-level emotional lexicon is established for background words by using SWAT. • UAM is very effective in detecting emotions in both short texts and long texts. Graphical abstract Abstract As the rapid development of Web 2.0 communities, social media service providers offer users a convenient way to share and create their own contents such as online comments, blogs, microblogs/tweets, etc. Understanding the latent emotions of such short texts from social media via the computational model is an important issue as such a model will help us to identify the social events and make better decisions (e.g., investment in stocking market). However, it is always very challenge to detect emotions from above user-generated contents due to the sparsity problem (e.g., a tweet is a short message). In this article, we propose an universal affective model (UAM) to classify readers’ emotions over unlabeled short texts. Different from conventional text classification model, the UAM structurally consists of topic-level and term-level sub-models, and detects social emotions from the perspective of readers in social media. Through the evaluation on real-world data sets, the experimental results validate the effectiveness of the proposed model in terms of the effectiveness and accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
38. Latent association rule cluster based model to extract topics for classification and recommendation applications.
- Author
-
dos Santos, Fabiano Fernandes, Domingues, Marcos Aurélio, Sundermann, Camila Vaccari, de Carvalho, Veronica Oliveira, Moura, Maria Fernanda, and Rezende, Solange Oliveira
- Subjects
- *
VECTOR spaces , *TEXT mining , *LEGAL documents , *DIRICHLET forms , *LITERATURE - Abstract
The quality of any text mining technique is highly dependent on the features that are used to represent the document collection. A classical form of document representation is the vector space model (VSM), according to which the documents are represented as vectors of weights that correspond to the features of the documents. The bag-of-words model is the most popular VSM approach due to its simplicity and general applicability, but this model does not include term dependency and has a high dimensionality. In the literature, several models for document representation have been proposed in order to capture the dependency of terms. Among them, the topic model representation is one of the most interesting approaches - since it describes the collection of documents in a way that reveals their internal structure and the interrelationships therein, and also provides a dimensionality reduction. However, even for topic models, the efficient extraction of information concerning the relations among terms for document representation is still a major research challenge. In order to address this issue, we proposed thelatent association rule cluster based model (LARCM). The LARCM is a non-probabilistic topic model that makes use of association rule clustering to build a document representation with low dimensionality in such a way that each feature (i.e., topic) is comprised of information concerning relations among the terms. We evaluated the interpretability of the topics obtained by using our proposed model against the ones provided by the traditional latent dirichlet allocation (LDA) model and the LDA model using a document representation that includes correlated terms (i.e., bag-of-related-words). The experimental results indicated that the LARCM provides topics with better interpretability than the LDA models. Additionally, we used the topics obtained by the LARCM in two different applications: text classification and page recommendation. With respect to text classification, the topics were used to improve document collection representation. Concerning page recommendation, topics were used as contextual information in context-aware recommender systems. Results have shown that the topics provided by the LARCM can be used to improve both applications. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
39. Assembly makespan estimation using features extracted by a topic model.
- Author
-
Hu, Zheyuan, Cheng, Yi, Xiong, Hui, and Zhang, Xu
- Subjects
- *
PRODUCTION scheduling , *SUPPORT vector machines , *RANDOM forest algorithms , *PRODUCT attributes - Abstract
Accurate makespan estimation is imperative during production scheduling to increase the flexibility and efficiency of work plans. However, given the complexities of production systems and product customizations, it is challenging to estimate makespans with high accuracy. In this paper, we propose a topic model-based neural network (TM-NN) method to increase the accuracy of makespan estimation for assembly processes. First, unlike traditional methods that use influential factors as inputs, we extract assembly features using a latent Dirichlet allocation model that mines latent topic information from an assembly instruction corpus. Then, the assembly process is represented as a sequence model with both assembly topics and features of the product physical characteristics, the assembly process, the equipment, the personnel, and uncertainty. Finally, we use a structured numerical vector as the input to machine learning-based predictive models, including a neural network, a random forest, and a support vector machine, and estimate makespans. The results show that the proposed TM-NN method effectively extracts latent topics in assembly documents and significantly increases the accuracy of makespan estimation. • We propose a topic model-based neural network (TM-NN) method to increase the accuracy of makespan estimation for assembly processes. • TM-NN can extract latent topic information from an assembly instruction corpus. • We compare the average errors between the predicted and real makespans. • TM-NN exhibits superior effectiveness and accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. Discovering topics and trends in the field of Artificial Intelligence: Using LDA topic modeling.
- Author
-
Yu, Dejian and Xiang, Bo
- Subjects
- *
ARTIFICIAL intelligence , *APPROXIMATE reasoning , *MACHINE learning , *COMPUTER vision - Abstract
Artificial Intelligence (AI) has affected all aspects of social life in recent years. This study reviews 177,204 documents published in 25 journals and 16 conferences in the AI research from 1990 to 2021, and applies the Latent Dirichlet allocation (LDA) model to extract the 40 topics from the abstracts. According to the topics obtained, 7 subfields of the AI field can be discovered: Approximate Reasoning , Computational Theory , Intelligent Automation , Artificial Neural Network , Machine Learning , Natural Language Processing , and Computer Vision. This study aggregates the results of the LDA model from the perspectives of year, publication source, and country/region. The aggregated result is the topic distribution from different perspectives. Analysis of the aggregated results reveals the research characteristics of different publication sources (countries/regions) in the AI research, and which publication sources (countries/regions) have similar research content. These results provide help for scholars and research institutions to choose research directions, and new entrants to understand the dynamics of the field. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
41. A systematic review highlights that there are multiple benefits of urban agriculture besides food.
- Author
-
Pradhan, Prajal, Callaghan, Max, Hu, Yuanchao, Dahal, Kshitij, Hunecke, Claudia, Reusswig, Fritz, Lotze-Campen, Hermann, and Kropp, Jürgen P.
- Abstract
Urban agriculture, including peri-urban farming, can nourish around one billion city dwellers and provide multiple social, economic, and environmental benefits. However, these benefits depend on various factors and are debated. Therefore, we used machine learning to semi-automate a systematic review of the existing literature on urban agriculture. It started with around 76,000 records for initial screening based on a broad keyword search strategy. We applied the topic modeling approach to systematically understand various aspects of urban agriculture based on the full text of around 1,450 relevant publications. Urban agriculture literature covers 14 topics, clustered into 11 themes related to urban agriculture forms, their multi-functionalities, and their underlying challenges. These forms are small-scale ground-based and building-integrated systems. The multi-functionalities include food, livelihoods, health benefits, social space, green infrastructure, biodiversity, and ecosystem services. Therefore, promoting urban agriculture requires accounting for its multi-functionalities, besides food provisioning, and encouraging efficient and sustainable practices. • We identify 14 topics on urban agriculture, which vary spatially and temporally. • Urban agriculture provides socioeconomic and environmental benefits, besides food. • Urban agriculture faces challenges of inefficient practices and health risks. • Sustainable practices can reduce health risks and input needs for urban agriculture. • Promoting urban agriculture requires accounting for its multi-functionalities. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
42. ITMDID: An improved topic model for defect information derivation.
- Author
-
Zheng, Lu, He, Zhen, and He, Shuguang
- Subjects
- *
INFORMATION modeling , *MANUFACTURING defects , *GIBBS sampling , *PRODUCT failure , *SOCIAL media , *INFORMATION resources - Abstract
With the help of topic models, social media data have become a valuable information source for manufacturers to identify product defects. However, when absorbing information on product defects, special defect-unrelated texts that mention product failures induced by customers will be mistaken as defect-related by existing methods, leading to biased results. Furthermore, extant topic models suffer from the "topic-indiscriminative" problem, which means topics derived by topic models are similar. In order to address these problems, we first create a lexicon to differentiate whether the texts mentioning product failures are defect-related. Then we propose a topic model named Improved Topic Model for Defect Information Derivation (ITMDID) to derive product defect information from defect-related texts. To enhance the discrimination of extracted topics, we consider the word coherence when inferring variables of ITMDID via Gibbs sampling. Finally, we applied the developed approaches to analyze two case studies of the automobile and laptop industries. Experimental results prove that our methods can derive product defects from social media data more accurately and comprehensively than state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
43. A short text sentiment-topic model for product reviews.
- Author
-
Xiong, Shufeng, Wang, Kuiyi, Ji, Donghong, and Wang, Bingkun
- Subjects
- *
SENTIMENT analysis , *COMPUTATIONAL biology , *DATA mining , *CHINESE people , *GRAPHICAL modeling (Statistics) , *ATTITUDE (Psychology) - Abstract
Topic and sentiment joint modelling has been successfully used in sentiment analysis for product reviews. However, the problem of text sparse is universal with the widespread smart devices and the shorter product reviews. In this paper, we propose a joint sentiment-topic model WSTM (Word-pair Sentiment-Topic Model) for the short text reviews, detecting sentiments and topics simultaneously from the text, especially considering the text sparse problem. Unlike other topic models modelling the generative process of each document, our directly models the generation of the word-pair set from the whole global corpus. In the generative process of WSTM, all of the words in a sentence have the same sentiment polarity, and two words in a word-pair have the same topic. We apply WSTM to two real-life Chinese product review datasets to verify its performance. In three experiments, compared with the existing approaches, the results demonstrate WSTM is quantitatively effective on both topic discovery and document level sentiment. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
44. On predicting elections with hybrid topic based sentiment analysis of tweets.
- Author
-
Bansal, Barkha and Srivastava, Sangeet
- Subjects
SENTIMENT analysis ,LEXICAL access ,MICROBLOGS ,ELECTION forecasting ,PROGRAMMING language semantics - Abstract
Twitter sentiment analysis is quick and inexpensive way for real-time election monitoring and modern day election predictions. Recent research relies on explicit mining of public sentiment using lexical and syntactic features in tweets. However, underlying implicit word relations and co-occurrences are overlooked. This task of capturing semantic relations and word co-occurrences further becomes challenging in case of short length tweets where words are limited. In this paper, we introduce a novel method: Hybrid Topic Based Sentiment Analysis (HTBSA) with the aim of capturing word relations and co-occurrences in short length tweets for election prediction using tweets. First, we extract latent topics from rich corpus of short texts using Biterm Topic model (BTM), then sentiments for each topic are learnt from pre-existing lexical resources. Finally, sentiment score of each tweet is calculated using sentiment orientation and weight of each topic contained in it. We use more than 300,000 tweets, collected from 1st-20th February, 2017, to predict Uttar Pradesh (U.P) legislative elections. Geo tagging is employed for key words which are not exclusive to the elections. Results show that HTBSA has out performed existing Twitter based election prediction techniques with a decrease of 3.5% in MAE. Our study can be easily and efficiently extended for real time election monitoring and future election predictions. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
45. Weakly supervised topic sentiment joint model with word embeddings.
- Author
-
Fu, Xianghua, Sun, Xudong, Wu, Haiying, Cui, Laizhong, and Huang, Joshua Zhexue
- Subjects
- *
CORPORA , *DIRICHLET problem , *SENTIMENT analysis , *SEMANTICS , *ALGORITHMS , *VOCABULARY - Abstract
Topic sentiment joint model aims to deal with the problem about the mixture of topics and sentiment simultaneously from online reviews. Most of existing topic sentiment modeling algorithms are mainly based on the state-of-art latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA), which infer sentiment and topic distributions from the co-occurrence of words. These methods have been proposed and successfully used for topic and sentiment analysis. However, when the training corpus is small or when the documents are short, the textual features become sparse, so that the results of the sentiment and topic distributions might be not very satisfied. In this paper, we propose a novel topic sentiment joint model called weakly supervised topic sentiment joint model with word embeddings (WS-TSWE), which incorporates word embeddings and HowNet lexicon simultaneously to improve the topic identification and sentiment recognition. The main contributions of WS-TSWE include the following two aspects. (1) Existing models generate the words only from the sentiment-topic-to-word Dirichlet multinomial component, but the WS-TSWE model replaces it with a mixture of two components, a Dirichlet multinomial component and a word embeddings component. Since the word embeddings are trained on a very large corpora and can be used to extend the semantic information of the words, they can provide a certain solution for the problem of the textual sparse. (2) Most of previous models incorporate sentiment knowledge in the β priors. And the priors are usually set from a dictionary and completely rely on previous domain knowledge to identify positive and negative words. In contrast, the WS-TSWE model calculates the sentiment orientation of each word with the HowNet lexicon and automatically infers sentiment-based β priors for sentiment analysis and opinion mining. Furthermore, we implement WS-TSWE with Gibbs sampling algorithms. The experimental results on Chinese and English data sets show that WS-TSWE achieved significant performance in the task of detecting sentiment and topics simultaneously. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
46. Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization.
- Author
-
Wei, Chao, Luo, Senlin, Pan, Limin, Wu, Zhouting, Zhang, Ji, and Safi, Qamas Gul Khan
- Subjects
- *
EMBEDDINGS (Mathematics) , *MARKOV random fields , *PROBLEM solving , *APPROXIMATION algorithms , *MATHEMATICAL regularization - Abstract
Topic model is a practical method for learning interpretable models of text corpora and have become a key problem of document representation. Some recently proposed topic models incorporate the intrinsic geometrical information of the document manifold and yield a discriminative topic representation. However, the existing manifold-inspired topic models fail to provide the probability weighting information of local geometrical pattern, thus leads to a limitation to estimate intrinsic semantic information of topic representation. In this paper, we consider the problem of topic modeling with intrinsic structure of document manifold and propose an unsupervised AutoEncoder-based topic modeling framework, named locally weighted embedding topic model (LWE-TM). Different from existing manifold-inspired topic models, LWE-TM defines a group of probability coefficients to uncover the local geometrical pattern by the Markov random walk structure of affinity graph, and regularizes the training of sparse AutoEncoder (sAE) to explicitly recover such local geometrical pattern with the topics encoding. Under the regularized training framework, the encoding network becomes local-invariant around the neighborhood of the document manifold and enable us to perform a readily topic inference for out-of-sample documents, efficiently improving the generalization and discrimination of topics encoding. The experimental results on two widely-used corpus demonstrate the superiority of LWE-TM to comparative models in document modeling, document clustering and classification tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
47. GIST: A generative model with individual and subgroup-based topics for group recommendation.
- Author
-
Ji, Ke, Chen, Zhenxiang, Sun, Runyuan, Ma, Kun, Yuan, Zhongjie, and Xu, Guandong
- Subjects
- *
RECOMMENDER systems , *PROBABILISTIC generative models , *GRAPH theory , *TOPOLOGY , *DECISION making - Abstract
In this paper, a Topic-based probabilistic model named GIST is proposed to infer group activities, and make group recommendations. Compared with existing individual-based aggregation methods, it not only considers individual members’ interest, but also consider some subgroups’ interest. Intuition might seem that when a group of users want to take part in an activity, not every group member is decisive, instead, more likely the subgroups of members having close relationships lead to the final activity decision. That motivates our study on jointly considering individual members’ choices and subgroups’ choices for group recommendations. Based on this, our model uses two kinds of unshared topics to model individual members’ interest and subgroups’ interest separately, and then make final recommendations according to the choices from the two aspects with a weight-based scheme. Moreover, the link information in the graph topology of the groups can be used to optimize the weights of our model. The experimental results on real-life data show that the recommendation accuracy is significantly improved by GIST comparing with the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
48. Latent Dirichlet mixture model.
- Author
-
Chien, Jen-Tzung, Lee, Chao-Hsi, and Tan, Zheng-Hua
- Subjects
- *
DIRICHLET forms , *SEMANTICS , *LATENT variables , *MATHEMATICAL variables , *BAYESIAN analysis - Abstract
Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model . The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
49. A Topic Drift Model for authorship attribution.
- Author
-
Yang, Min, Chen, Xiaojun, Tu, Wenting, Lu, Ziyu, Zhu, Jia, and Qu, Qiang
- Subjects
- *
GLACIAL drift , *AUTHORSHIP attribution laws , *IDENTIFICATION , *ARTIFICIAL neural networks ,WRITING - Abstract
Authorship attribution is an active research direction due to its legal and financial importance. Its goal is to identify the authorship from the anonymous texts. In this paper, we propose a Topic Drift Model (TDM), which can monitor the dynamicity of authors’ writing styles and learn authors’ interests simultaneously. Unlike previous authorship attribution approaches, our model is sensitive to the temporal information and the ordering of words. Thus it can extract more information from texts. The experimental results show that our model achieves better results than other models in terms of accuracy. We also demonstrate the potential of our model to address the authorship verification problem. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
50. The latent topic block model for the co-clustering of textual interaction data.
- Author
-
Bergé, Laurent R., Bouveyron, Charles, Corneli, Marco, and Latouche, Pierre
- Subjects
- *
PARALLEL algorithms , *PARAMETER estimation - Abstract
Textual interaction data involving two disjoint sets of individuals/objects are considered. An example of such data is given by the reviews on web platforms (e.g. Amazon, TripAdvisor, etc.) where buyers comment on products/services they bought. A new generative model, the latent topic block model (LTBM), is developed along with an inference algorithm to simultaneously partition the elements of each set, accounting for the textual information. The estimation of the model parameters is performed via a variational version of the expectation maximization (EM) algorithm. A model selection criterion is formally obtained to estimate the number of partitions. Numerical experiments on simulated data are carried out to highlight the main features of the estimation procedure. Two real-world datasets are finally employed to show the usefulness of the proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.