Descriptor: "Topic model" / Journal: scientometrics - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Topic model"' showing total 59 results

Start Over Descriptor "Topic model" Journal scientometrics

59 results on '"Topic model"'

1. Cover papers of top journals are reliable source for emerging topics detection: a machine learning based prediction framework.

Author: Wei, Wenjie, Liu, Hongxu, and Sun, Zhuanlan
Abstract: The detection of emerging trends is of great interest to many stakeholders such as government and industry. Previous research focused on the machine learning, network analysis and time series analysis based on the bibliometrics data and made a promising progress. However, these approaches inevitably have time delay problems. For the reason that leader papers of "emerging topics" share the similar characters with the "cover papers", this study present a novel approach to translate the "emerging topics" detection to "cover paper" prediction. By using "AdaBoost model" and topic model, we construct a machine learning framework to imitate the top journal (chief) editor's judgement to select cover paper from material science. The results of our prediction were validated by consulting with field experts. This approach was also suitable for the Nature, Science, and Cell journals. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

2. Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec.

Author: Gao, Qiang, Huang, Xiao, Dong, Ke, Liang, Zhentao, and Wu, Jiang
Abstract: The combination of the topic model and the semantic method can help to discover the semantic distributions of topics and the changing characteristics of the semantic distributions, further providing a new perspective for the research of topic evolution. This study proposes a solution for quantifying the semantic distributions and the changing characteristics based on words in topic evolution through the Dynamic topic model (DTM) and the word2vec model. A dataset in the field of Library and information science (LIS) is utilized in the empirical study, and the topic-semantic probability distribution is derived. The evolving dynamics of the topics are constructed. The characteristics of evolving dynamics are used to explain the semantic distributions of topics in topic evolution. Then, the regularities of evolving dynamics are summarized to explain the changing characteristics of semantic distributions in topic evolution. Results show that no topic is distributed in a single semantic concept, and most topics correspond to various semantic concepts in LIS. The three kinds of topics in LIS are the convergent, diffusive, and stable topics. The discovery of different modes of topic evolution can further prove the development of the field. In addition, findings indicate that the popularity of topics and the characteristics of evolving dynamics of topics are irrelevant. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

3. An integrated approach for detecting and quantifying the topic evolutions of patent technology: a case study on graphene field.

Author: Wu, Hong, Yi, Huifang, and Li, Chang
Abstract: Comprehensive, in-depth and accurate analyses of patent technology topic evolutions become increasingly significant since the analytical results can offer related personnel the scientific support to explore or trace back to the origin and the development of the technology. However, existing methods of topic evolutions do not facilitate better understanding of how a technology topic has evolved. This paper introduces an integrated method with the LDA topic identification analysis, the improved topic life cycle analysis, and the improved technology entropy analysis for identifying, measuring and interpreting topics evolutions from patent literatures. Multiple indicators we proposed and improved have been used to measure the degree of topic development and identify the topic types of different states. And, the concept of technology entropy has been redefined and improved to measure the changes of evolution intensity and evolution direction among topics, mainly used the topic word and its probability. The results from different methods are mutually connected and complemented. The process and characteristics of topic evolution are further overviewed. Graphene is selected for the case study. The mechanism of evolution and the effect of improved methods are focused on. The research has clearly shown that more accurate and comprehensive results can be achieved for topic evolution by employing this integrated method. Furthermore, the above integration of methods has potential contributions to hot spot detection and potential technology discovery. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

4. Measuring cognitive proximity using semantic analysis: A case study of China's ICT industry.

Author: Qin, Yawen, Qin, Xiaozhen, Chen, Haohui, Li, Xun, and Lang, Wei
Abstract: Quantification of knowledge technologies has long posed a challenge to the measurement of cognitive proximity. This paper proposes a method to measure cognitive proximity by mining patent description text with the LDA topic model. With the patent-topic distribution got from the LDA topic model, the cognitive proximity is measured between enterprises or within cities, which could make up for the shortage of existing measurement methods limited by the rigid IPC, industry classification system, or non-standard interview data. Our empirical studies on the ICT industry indicate that the 20 topics obtained through the topic model have a good correspondence with the technologies involved in this industry's leading products and services. And we dig out the knowledge and technology information in the patent text to depict the technology landscape, including mining the changes of technology topics over time, the difference of distribution in various cities, and the development trend of the urban innovation network. This method's effectiveness is also proved in the model that compares different measurement methods when revealing the relationship between cognitive proximity and patent productivity. Last, researchers can use this approach to delve deeper into urban innovation issues, and policymakers can use it to figure out further innovation. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

5. Mapping the technology evolution path: a novel model for dynamic topic detection and tracking.

Author: Liu, Huailan, Chen, Zhiwang, Tang, Jie, Zhou, Yuan, and Liu, Sheng
Abstract: Identifying the evolution path of a research field is essential to scientific and technological innovation. There have been many attempts to identify the technology evolution path based on the topic model or social networks analysis, but many of them had deficiencies in methodology. First, many studies have only considered a single type of information (text or citation information) in scientific literature, which may lead to incomplete technology path mapping. Second, the number of topics in each period cannot be determined automatically, making dynamic topic tracking difficult. Third, data mining methods fail to be effectively combined with visual analysis, which will affect the efficiency and flexibility of mapping. In this study, we developed a method for mapping the technology evolution path using a novel non-parametric topic model, the citation involved Hierarchical Dirichlet Process (CIHDP), to achieve better topic detection and tracking of scientific literature. To better present and analyze the path, D3.js is used to visualize the splitting and fusion of the evolutionary path. We used this novel model to mapping the artificial intelligence research domain, through a successful mapping of the evolution path, the proposed method's validity and merits are shown. After incorporating the citation information, we found that the CIHDP can be mapping a complete path evolution process and had better performance than the Hierarchical Dirichlet Process and LDA. This method can be helpful for understanding and analyzing the development of technical topics. Moreover, it can be well used to map the science or technology of the innovation ecosystem. It may also arouse the interest of technology evolution path researchers or policymakers. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

6. Evaluating wider impacts of books via fine-grained mining on citation literatures.

Author: Zhou, Qingqing and Zhang, Chengzhi
Abstract: Citations are commonly used to measure academic impacts of scientific publications, including books. However, citation frequencies of books are single numerical evaluation metrics. It neglects details about books (e.g. contents), which may lead to the decline in comprehensiveness of evaluation results. Hence, fine-grained mining on books' citation information to integrate frequency metrics and content metrics can obtain more reliable evaluation results. Books' citation literatures (i.e. literatures cited books) present citation frequencies of books, and reflect citation intentions, topics and domains simultaneously. Existing research focused on analysing citation frequencies, authors or citation contexts of citation literatures to conduct citation analysis. It may be costly for collecting citation contexts and neglected latent information of citation literatures, such as impact scopes or topics of books reflected by citation literatures. Therefore, in this paper, we conducted fine-grained analysis on books' citation literatures to assess whether citation literatures could be systematically used for indicators of books' wider impacts. Specifically, we firstly collected books and corresponding information about their citation literatures. Then, we extracted multi-dimensional metrics via multi-granularity mining on citation literatures, and got assessment results by integrating content-level and frequency-level metrics. Finally, we compared assessment results based on citation literatures and existing metrics for assessing books' impacts to verify assessment results. Experimental results infer that citation literatures are a promising source for book impact assessment, especially books' academic impacts. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

7. Incorporating citation impact into analysis of research trends.

Author: Lee, Minchul and Song, Min
Abstract: In the past decades, there have been a number of proposals to apply topic modeling to research trend analysis. However, most of previous studies have relied primarily on document publication year and have not incorporated the impact of articles into trend analysis. Unlike previous trend analysis using topic modeling, we incorporate citation count, which can be viewed as the impact of articles, into trend analysis to shed a new light on the understanding of research trends. To this end, we propose the Generalized Dirichlet multinomial regression (g-DMR) topic model, which improves the DMR topic model by replacing a linear inner product in topic priors, exp x d · λ t , with a more general form based on topic distribution function (TDF), exp f x d + ε . We use multidimensional Legendre Polynomial as TDF to capture publication year and the number of citations per publication simultaneously. In DMR model, since metadata could affect the document-topic distribution only monotonically and continuous values such as publication year and citation count need to be discretized, it is difficult to view the dynamic change of each topic. But the g-DMR model can handle various orthogonal continuous variables with arbitrary order of polynomial, so it can show more dynamic topic trends. Two major experiments show that the proposed model is better suited for topic generation with consideration of citation impact than DMR does for the trend analysis in the field of Library and Information Science in general and Text Mining in particular. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

8. A novel method to identify emerging technologies using a semi-supervised topic clustering model: a case of 3D printing industry.

Author: Zhou, Yuan, Lin, Heng, Liu, Yufei, and Ding, Wei
Abstract: There have been recent attempts to identify emerging technologies by using topic-based analysis, but many of them have methodological deficiencies. First, analyses are unsupervised, and unsupervised methods cannot incorporate supervised knowledge that is needed to better identify technological domains. Second, those methods lack semantic interpretation, as many of them still remain at word-level analyses, we developed a novel technology-identification method that uses a semi-supervised topic clustering model (Labeled Dirichlet Multi Mixture model) to integrate technological domain knowledge. The model also generates a sentence-level semantic technological topic description through the topic description method (Various-aspects Sentence-level Description) on information extraction. We used this novel method to analyze the technology of the 3D printing industry, and successfully identified emerging technologies by differentiating new topics from the traditional topics, the results effectively demonstrated the semantic technological topic description by showing sentences. This method could be of great interest to technology forecasters and relevant policy-makers. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

9. Topic based research competitiveness evaluation.

Author: Ma, Tingcan, Li, Ruinan, Ou, Guiyan, and Yue, Mingliang
Abstract: Research competitiveness analysis refers to the measurement, comparison and analysis of the research status (i.e., strength and/or weakness) of different scientific research bodies (e.g., institutions, researchers, etc.) in different research fields. Improving research competitiveness analysis method can be conducive to accurately obtaining the research status of research fields and research bodies. This paper presents a method of evaluating the competitiveness of research institutions based on research topic distribution. The method uses the LDA topic model to obtain a paper-topic distribution matrix to objectively assign the academic impact of papers (such as number of citations) to research topics. Then the method calculates the competitiveness of each research institution on each research topic with the help of an institution-paper matrix. Finally, the competitiveness and the research strength and/or weakness of the institutions are defined and characterized. A case study shows that the method can lead to an objective and effective evaluation of the research competitiveness of research institutions in a given research field. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

10. Citation contexts as a data source for evaluation of scholarly consumption

Author: Sergey Parinov
Subjects: Topic model, Consumption (economics), Data source, Thematic map, General Social Sciences, Sociology, Representation (arts), Thematic structure, Library and Information Sciences, Citation, GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries), Data science, Computer Science Applications
Abstract: In recent years, large datasets of citation contexts from research publications have become available for scientometric studies. Such citation contexts contain different characteristics of relationships between citing and cited papers, including information about publications that were in some way used by citing authors, about the motivations of this use, etc. Some of these characteristics can be considered as indicators of scholarly consumption of the citing authors. Based on the citation contexts data, the scholarly consumption can be characterized by four indicators: (a) data on cited (consumed) publications and their authors (suppliers); (b) types of scholarly consumption; (c) its thematics; and (d) temporary changes in these data. The indicators can be grouped and merged in various ways based on belonging to common citation contexts and/or on the coincidence of their values. By this way, one can create datasets for various objects and tasks of scientometric evaluation of scholarly consumption. The article proposes a general approach for building the scholarly consumption indicators, and presents the results of the experiments on evaluating a thematic structure of scholarly consumption. For this, thematically significant groups of words (topics) were selected from the citation contexts by using the LDA topic modeling method. Topics are obtained from the citation contexts for three groups of publications: (1) publications of a given author, (2) publications cited by a given author (suppliers), and (3) publications citing a given author (consumers). Thematic structures of scholarly consumption for a given author, as well as for his suppliers and consumers have been built. The features of the thematic structure representation in the forms of a tree of words and a flowchart are considered.
Published: 2021

11. Publication outperformance among global South researchers: An analysis of individual-level and publication-level predictors of positive deviance

Author: Richard Heeks, Basma H. Albanna, and Julia Handl
Subjects: Topic model, Citation analysis, Higher education policy, Applied psychology, Information system, General Social Sciences, Sample (statistics), Library and Information Sciences, Bibliometrics, Positive deviance, Psychology, Citation, Computer Science Applications
Abstract: Research and development are central to economic growth, and a key challenge for countries of the global South is that their research performance lags behind that of the global North. Yet, among Southern researchers, a few significantly outperform their peers and can be styled research “positive deviants” (PDs). In this paper we ask: who are those PDs, what are their characteristics and how are they able to overcome some of the challenges facing researchers in the global South? We examined a sample of 203 information systems researchers in Egypt who were classified into PDs and non-PDs (NPDs) through an analysis of their publication and citation data. Based on six citation metrics, we were able to identify and group 26 PDs. We then analysed their attributes, attitudes, practices, and publications using a mixed-methods approach involving interviews, a survey and analysis of publication-related datasets. Two predictive models were developed using partial least squares regression; the first predicted if a researcher is a PD or not using individual-level predictors and the second predicted if a paper is a paper of a PD or not using publication-level predictors. PDs represented 13% of the researchers but produced about half of all publications, and had almost double the citations of the overall NPD group. At the individual level, there were significant differences between both groups with regard to research collaborations, capacity development, and research directions. At the publication level, there were differences relating to the topics pursued, publication outlets targeted, and paper features such as length of abstract and number of authors.
Published: 2021

12. News media attention in Climate Action: latent topics and open access

Author: Tahereh Dehdarirad and Kalle Karlsson
Subjects: Topic model, Sustainable development, Descriptive statistics, Biblioteks- och informationsvetenskap, 05 social sciences, General Social Sciences, Legislation, Library and Information Sciences, 050905 science studies, Data science, Latent Dirichlet allocation, Information Studies, Readability, Computer Science Applications, symbols.namesake, Political science, symbols, Altmetrics, 0509 other social sciences, 050904 information & library sciences, News media
Abstract: In this study we investigated whether open access could assist the broader dissemination of scientific research in Climate Action (Sustainable Development Goal 13) via news outlets. We did this by comparing (i) the share of open and non-open access documents in different Climate Action topics, and their news counts, and (ii) the mean of news counts for open access and non-open access documents. The data set of this study comprised 70,206 articles and reviews in Sustainable Development Goal 13, published during 2014–2018, retrieved from SciVal. The number of news mentions for each document was obtained from Altmetrics Details Page API using their DOIs, whereas the open access statuses were obtained using Unpaywall.org. The analysis in this paper was done using a combination of (Latent Dirichlet allocation) topic modelling, descriptive statistics, and regression analysis. The covariates included in the regression analysis were features related to authors, country, journal, institution, funding, readability, news source category and topic. Using topic modelling, we identified 10 topics, with topics 4 (meteorology) [21%], 5 (adaption, mitigation, and legislation) [18%] and 8 (ecosystems and biodiversity) [14%] accounting for 53% of the research in Sustainable Development Goal 13. Additionally, the results of regression analysis showed that while keeping all the variables constant in the model, open access papers in Climate Action had a news count advantage (8.8%) in comparison to non-open access papers. Our findings also showed that while a higher share of open access documents in topics such as topic 9 (Human vulnerability to risks) might not assist with its broader dissemination, in some others such as topic 5 (adaption, mitigation, and legislation), even a lower share of open access documents might accelerate its broad communication via news outlets.
Published: 2021

13. Collective topical PageRank: a model to evaluate the topic-dependent academic impact of scientific papers.

Author: Zhang, Yongjun, Ma, Jialin, Wang, Zijian, Chen, Bolun, and Yu, Yongtao
Abstract: With the explosive growth of academic writing, it is difficult for researchers to find significant papers in their area of interest. In this paper, we propose a pipeline model, named collective topical PageRank, to evaluate the topic-dependent impact of scientific papers. First, we fit the model to a correlation topic model based on the textual content of papers to extract scientific topics and correlations. Then, we present a modified PageRank algorithm, which incorporates the venue, the correlations of the scientific topics, and the publication year of each paper into a random walk to evaluate the paper’s topic-dependent academic impact. Our experiments showed that the model can effectively identify significant papers as well as venues for each scientific topic, recommend papers for further reading or citing, explore the evolution of scientific topics, and calculate the venues’ dynamic topic-dependent academic impact. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

14. Mapping the field of psychology: Trends in research topics 1995–2015

Author: Lukas Erhard, Christian Koß, Saïd Unger, Oliver Wieczorek, Jan Rasmus Riebling, and Raphael H. Heiberger
Subjects: Topic model, Topic structure, Field (Bourdieu), 05 social sciences, General Social Sciences, 050109 social psychology, Library and Information Sciences, Popularity, 0506 political science, Computer Science Applications, Scientific discourse, 050602 political science & public administration, 0501 psychology and cognitive sciences, Research questions, Social science
Abstract: We map the topic structure of psychology utilizing a sample of over 500,000 abstracts of research articles and conference proceedings spanning two decades (1995-2015). To do so, we apply structural topic models to examine three research questions: (i) What are the discipline’s most prevalent research topics? (ii) How did the scientific discourse in psychology change over the last decades, especially since the advent of neurosciences? (iii) And was this change carried by high impact (HI) or less prestigious journals? Our results reveal that topics related to natural sciences are trending, while their ’counterparts’ leaning to humanities are declining in popularity. Those trends are even more pronounced in the leading outlets of the field. Furthermore, our findings indicate a continued interest in methodological topics accompanied by the ascent of neurosciences and related methods and technologies (e.g. fMRI’s). At the same time, other established approaches (e.g. psychoanalysis) become less popular and indicate a relative decline of topics related to the social sciences and the humanities., Bundesministerium für Bildung und Forschung, Projekt DEAL
Published: 2021

15. Identifying emerging technologies using expert opinions on the future: A topic modeling and fuzzy clustering approach

Author: Hyeonju Seol, Wooseok Jang, and Yongtae Park
Subjects: Topic model, Fuzzy clustering, Emerging technologies, Computer science, business.industry, General Social Sciences, Library and Information Sciences, Data science, Latent Dirichlet allocation, Computer Science Applications, symbols.namesake, Empirical research, Work (electrical), symbols, Food processing, business, Centrality
Abstract: As technology rapidly advances with the Fourth Industrial Revolution, many emerging technologies have been developed in several technology sectors. These technologies can (1) provide breakthroughs and fast growth and (2) have a tremendous impact on social and technological development. Many previous studies have attempted to identify emerging technologies by constructing a deterministic methodology with journal and patent data. However, previous research frameworks are not well suited to discover potential future influences due to two limitations: (1) they rely on past and present data and (2) methodologies analyze technologies based on discovered data. In contrast to previous attempts, this study suggests a framework on how to identify whether candidate emerging technologies will intensively grow and affect social and technological fields in the future. To do so, this study collects “expert opinions on the future” which contain future-oriented experts’ opinions from general and focused technology communities. Topic modeling was then conducted using Latent Dirichlet Allocation to discover the underlying topics and technologies that will be of interest in the future. Lastly, to identify the actual emerging technologies, fuzzy clustering was conducted using diversity and centrality index scores for the candidate technologies. To conduct this empirical study, this work selected 12 food processing technologies addressing hazards that threaten microbial contamination in food. The results of the analysis indicate that three food processing technologies (Pulsed electric field, Cold atmospheric plasma, and nanotechnology in food processing) can be classified as emerging technologies.
Published: 2021

16. An integrated approach for detecting and quantifying the topic evolutions of patent technology: a case study on graphene field

Author: Hong Wu, Huifang Yi, and Chang Li
Subjects: Topic model, Mechanism (biology), Computer science, Process (engineering), 05 social sciences, Measure (physics), General Social Sciences, Library and Information Sciences, 050905 science studies, Data science, Field (computer science), Computer Science Applications, Identification (information), Entropy (information theory), 0509 other social sciences, 050904 information & library sciences, TRACE (psycholinguistics)
Abstract: Comprehensive, in-depth and accurate analyses of patent technology topic evolutions become increasingly significant since the analytical results can offer related personnel the scientific support to explore or trace back to the origin and the development of the technology. However, existing methods of topic evolutions do not facilitate better understanding of how a technology topic has evolved. This paper introduces an integrated method with the LDA topic identification analysis, the improved topic life cycle analysis, and the improved technology entropy analysis for identifying, measuring and interpreting topics evolutions from patent literatures. Multiple indicators we proposed and improved have been used to measure the degree of topic development and identify the topic types of different states. And, the concept of technology entropy has been redefined and improved to measure the changes of evolution intensity and evolution direction among topics, mainly used the topic word and its probability. The results from different methods are mutually connected and complemented. The process and characteristics of topic evolution are further overviewed. Graphene is selected for the case study. The mechanism of evolution and the effect of improved methods are focused on. The research has clearly shown that more accurate and comprehensive results can be achieved for topic evolution by employing this integrated method. Furthermore, the above integration of methods has potential contributions to hot spot detection and potential technology discovery.
Published: 2021

17. Measuring cognitive proximity using semantic analysis: A case study of China's ICT industry

Author: Yawen Qin, Wei Lang, Xun Li, Xiaozhen Qin, and Haohui Chen
Subjects: Topic model, Industry classification, Computer science, business.industry, 05 social sciences, General Social Sciences, Distribution (economics), Cognition, Library and Information Sciences, 050905 science studies, Data science, Computer Science Applications, Empirical research, Information and Communications Technology, Semantic analysis (knowledge representation), 0509 other social sciences, 050904 information & library sciences, business, Productivity
Abstract: Quantification of knowledge technologies has long posed a challenge to the measurement of cognitive proximity. This paper proposes a method to measure cognitive proximity by mining patent description text with the LDA topic model. With the patent-topic distribution got from the LDA topic model, the cognitive proximity is measured between enterprises or within cities, which could make up for the shortage of existing measurement methods limited by the rigid IPC, industry classification system, or non-standard interview data. Our empirical studies on the ICT industry indicate that the 20 topics obtained through the topic model have a good correspondence with the technologies involved in this industry's leading products and services. And we dig out the knowledge and technology information in the patent text to depict the technology landscape, including mining the changes of technology topics over time, the difference of distribution in various cities, and the development trend of the urban innovation network. This method's effectiveness is also proved in the model that compares different measurement methods when revealing the relationship between cognitive proximity and patent productivity. Last, researchers can use this approach to delve deeper into urban innovation issues, and policymakers can use it to figure out further innovation.
Published: 2021

18. Predicting future technological convergence patterns based on machine learning using link prediction

Author: Joon Hyung Cho, So Young Sohn, and Jungpyo Lee
Subjects: Topic model, Trademark, Association rule learning, business.industry, Computer science, General Social Sciences, Technological convergence, Interval (mathematics), Library and Information Sciences, Machine learning, computer.software_genre, Latent Dirichlet allocation, Computer Science Applications, Environmental technology, Random forest, symbols.namesake, symbols, Artificial intelligence, business, computer
Abstract: Technological convergence among different industries is an important source of innovation and economic growth. In this study, we propose a new framework for predicting patterns of technological convergence in two different industries. We first construct an inter-process communication co-occurrence network based on association rule mining. We then use a machine learning approach with various link prediction indices to predict future technological convergence patterns. Next, we use latent Dirichlet allocation (LDA) topic modeling to identify the keywords associated with technologies that are predicted to converge. We apply our proposed framework to a dataset of patents from the United States Patent and Trademark Office from 2012 to 2014 in the fields of chemical engineering and environmental technology. The empirical analysis results show that the prediction over a 4-year time interval using the random forest model achieves the highest performance. Moreover, the LDA topic modeling results indicate that the keywords “membrane,” “air,” “separation,” “catalyst,” “gas,” “exhaust,” and “particle” are descriptions of technologies that are likely to converge. This study is expected to contribute to technological and economic growth by predicting new technological fields that are likely to emerge in the future, and hence the directions that firms focusing on technological advancement should prepare for.
Published: 2021

19. Evolution and diffusion of information literacy topics

Author: Ye Chen, Qiyu Wang, and Yating Li
Subjects: Topic model, Computer science, Mechanism (biology), business.industry, Information literacy, media_common.quotation_subject, Word embedding, General Social Sciences, Information technology, Subject (documents), Library and Information Sciences, Topic evolution and diffusion, Data science, Article, Literacy, Field (geography), Topic modeling, Computer Science Applications, Multidisciplinary approach, business, media_common
Abstract: Investigation of the topic of information literacy and its changes can be informative for researchers and provide a better understanding of the corresponding domains. This study conducted a topic model dynamic analysis of the articles on information literacy studies in the Web of Science core collection database that were published from 2005 to 2019. The global topics and their popularities, topical similarities and correlations, along with the evolution of temporal local topics and the diffusion of subject local topics were analyzed and presented. Nine global topics differed in terms of their temporal and subject characteristics, and this study focused on ability, technology, field, people, place and application of information literacy. For the temporal local topics, crossing was the main evolutionary mechanism; hence, the core topic words were relatively stable, but few new research directions have been explored in recent years. For the subject local topics, absorbing with division and absorbing were the main mechanisms, which supported the diffusion progress of information literacy studies among subjects. However, it is necessary to promote the development of future research through the innovative development of multidisciplinary integration. Researchers and practitioners should focus on the impact of information technology, increase the breadth and depth of the research field, and develop innovative evaluation methods that are based on data to promote the comprehensive, sustainable and effective improvement in information literacy.
Published: 2021

20. Research topics and trends of the hashtag recommendation domain

Author: Navid Yazdanjue, Ramin Karimianghadim, Babak Amiri, and Liaquat Hossain
Subjects: Topic model, Social network, Computer science, Microblogging, business.industry, Deep learning, 05 social sciences, General Social Sciences, Library and Information Sciences, 050905 science studies, Data science, Field (computer science), Computer Science Applications, Domain (software engineering), Categorization, Similarity (psychology), Social media, Artificial intelligence, 0509 other social sciences, 050904 information & library sciences, business
Abstract: In microblogging platforms, hashtags are used to annotate the microblogs for a more convenient categorization and analysis of the published contents. Due to the fast growth of the social network, the hashtag recommendation field has attracted the researchers’ attention most recently. In this study, a review of existing works in the hashtag recommendation filed is presented. After collecting all the papers in this field, the author keywords are exploited in order to extract popular topics and explore the evolution of them since their inception. In this regard, statistical analysis of the keywords, keyword-pairs co-occurrences, and the cluster analysis through the co-word data (co-word analysis) are performed. The obtained results demonstrate that there are four evolved thematic areas in this research field, including “SIMILARITY”, “HASHTAG-RECOMMENDATION”, “MACHINE-LEARNING”, and “POPULARITY-PREDICTION”. Besides, there are some popular themes in each thematic area, such as the “DEEP_LEARNING”, which has excellent future development potential. Similarly, the “SIMILARITY” and “TOPIC-MODEL” are two motor themes that have gained increased interest from researchers in recent studies. Eventually, the analysis results of the related works in the hashtag recommendation domain are utilized to extract the main approaches in this research area involving “DEEP LEARNING”, “TOPIC MODELING”, “SIMILARITY”, “CLASSIFICATION”, and “TOPICAL TRANSLATION”. The results’ implications and the future research directions determined that the researchers’ interest in the field of hashtag recommendation will increase rapidly.
Published: 2021

21. A topic network analysis of the system turn in the environmental sciences

Author: Jurgita Jurkevičienė, Florian Rabitz, Agnė Budžytė, and Alin Olteanu
Subjects: Structure (mathematical logic), Topic model, media_common.quotation_subject, 05 social sciences, General Social Sciences, Scientific literature, Library and Information Sciences, 050905 science studies, Data science, Computer Science Applications, Earth system science, 0509 other social sciences, 050904 information & library sciences, Discipline, Global environmental analysis, Network analysis, Diversity (politics), media_common
Abstract: The concept of Earth system science denotes a shift in the scientific discourse from disciplinary accounts of isolated components of the global environment towards the holistic and interdisciplinary treatment of their complex, functional interactions. We measure to what extent the environmental scientific literature of the past three decades reflects this system turn. Our initial dataset consists of 133,670 articles published in 95 relevant journals since 1990. We apply a combination of topic modelling and network analysis. Correlated Topic Models identify latent themes (“topics”) in the scientific discourse as well as intertopic correlations. This allows the generation of topic networks and thus the application of network-analytic techniques. We generate and analyze 2 topic networks. The first network focuses on climate linkages in a subset of our corpus consisting only of environmental journals without a climate-specific orientation: as the climate system is constitutional for the notion of Earth system science, a system turn should reflect itself in strong linkages between climate- and non-climate environmental topics. The second network, based on the full dataset, applies community detection to identify broader topical clusters. Here, we expect the system turn to manifest itself in communities with high topical diversity regarding the components of the global environment as well as types of human-nature interactions, rather than reflecting the boundaries between broader research fields. Our results show that climate topics are comparatively weakly connected and less integrated into broader thematic packages than other topics; that linkages frequently reflect conceptual debates rather than functional interactions between substantive environmental components; and that the scientific discourse splits into 4 broader domains, two of those being topically homogeneous and the other two comprising only marginally diverse topics. We conclude that the concept of Earth system science is primarily aspirational in nature rather than reflecting an empirical shift in the structure of the scientific discourse.
Published: 2021

22. Thirty years of research into hate speech: topics of interest and their evolution

Author: Lara Fontanella, Annalina Sarra, Eugenia Nissi, and Alice Tontodimamma
Subjects: Topic model, media_common.quotation_subject, 05 social sciences, Applied psychology, Ethnic group, Scopus, General Social Sciences, 02 engineering and technology, Library and Information Sciences, Latent Dirichlet allocation, Computer Science Applications, Hatred, symbols.namesake, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Sexual orientation, symbols, Relevance (information retrieval), Social media, 0509 other social sciences, 050904 information & library sciences, Psychology, media_common
Abstract: The exponential growth of social media has brought with it an increasing propagation of hate speech and hate based propaganda. Hate speech is commonly defined as any communication that disparages a person or a group on the basis of some characteristics such as race, colour, ethnicity, gender, sexual orientation, nationality, religion. Online hate diffusion has now developed into a serious problem and this has led to a number of international initiatives being proposed, aimed at qualifying the problem and developing effective counter-measures. The aim of this paper is to analyse the knowledge structure of hate speech literature and the evolution of related topics. We apply co-word analysis methods to identify different topics treated in the field. The analysed database was downloaded from Scopus, focusing on a number of publications during the last thirty years. Topic and network analyses of literature showed that the main research topics can be divided into three areas: “general debate hate speech versus freedom of expression”,“hate-speech automatic detection and classification by machine-learning strategies”, and “gendered hate speech and cyberbullying”. The understanding of how research fronts interact led to stress the relevance of machine learning approaches to correctly assess hatred forms of online speech.
Published: 2020

23. Identifying the intellectual structure of fields: introduction of the MAK approach

Author: Mehmet Ali Koseoglu
Subjects: Topic model, Structure (mathematical logic), Computer science, General Social Sciences, Library and Information Sciences, Intellectual structure, Data science, Representativeness heuristic, Latent Dirichlet allocation, Field (computer science), Computer Science Applications, symbols.namesake, Workflow, symbols, Strategic management
Abstract: This study introduces MAK approach to investigate intellectual structure of fields which combines text-net analysis (TNA), latent dirichlet allocation (LDA), and co-citation analysis. Researchers have previously deployed co-citation analysis to reveal the intellectual structure of fields. However, in these applications, the research has two technical limitations—small representativeness in datasets analyzed and the primary consideration for dated documents—towards the co-citation analysis. These limitations impede the formation of a larger picture in the structure. The present study seeks to eliminate these limitations by utilizing TNA and LDA methods as topic modeling approaches for 38,368 journal articles as references with 125,154 appearances in 2680 articles published between 1980 and 2019 in the Strategic Management Journal (SMJ). We suggest researchers should embrace MAK approach as complementary approach to research, with its focus on the intellectual structures of the field. We provide a workflow to show potential research applications and address advantages and limitations associated with the two new methods.
Published: 2020

24. Mapping the technology evolution path: a novel model for dynamic topic detection and tracking

Author: Jie Tang, Yuan Zhou, Zhiwang Chen, Sheng Liu, and Huailan Liu
Subjects: Hierarchical Dirichlet process, Flexibility (engineering), Topic model, Process (engineering), Computer science, 05 social sciences, General Social Sciences, Technological evolution, Scientific literature, Library and Information Sciences, 050905 science studies, computer.software_genre, Field (computer science), Computer Science Applications, 0502 economics and business, Path (graph theory), Data mining, 0509 other social sciences, computer, 050203 business & management
Abstract: Identifying the evolution path of a research field is essential to scientific and technological innovation. There have been many attempts to identify the technology evolution path based on the topic model or social networks analysis, but many of them had deficiencies in methodology. First, many studies have only considered a single type of information (text or citation information) in scientific literature, which may lead to incomplete technology path mapping. Second, the number of topics in each period cannot be determined automatically, making dynamic topic tracking difficult. Third, data mining methods fail to be effectively combined with visual analysis, which will affect the efficiency and flexibility of mapping. In this study, we developed a method for mapping the technology evolution path using a novel non-parametric topic model, the citation involved Hierarchical Dirichlet Process (CIHDP), to achieve better topic detection and tracking of scientific literature. To better present and analyze the path, D3.js is used to visualize the splitting and fusion of the evolutionary path. We used this novel model to mapping the artificial intelligence research domain, through a successful mapping of the evolution path, the proposed method’s validity and merits are shown. After incorporating the citation information, we found that the CIHDP can be mapping a complete path evolution process and had better performance than the Hierarchical Dirichlet Process and LDA. This method can be helpful for understanding and analyzing the development of technical topics. Moreover, it can be well used to map the science or technology of the innovation ecosystem. It may also arouse the interest of technology evolution path researchers or policymakers.
Published: 2020

25. Exploiting word embedding for heterogeneous topic model towards patent recommendation

Author: Jie Tang, Yanping Zhang, Jie Chen, Shu Zhao, and Chen Jialin
Subjects: Topic model, Word embedding, Information retrieval, Computer science, Association (object-oriented programming), 05 social sciences, General Social Sciences, ComputingMilieux_LEGALASPECTSOFCOMPUTING, Library and Information Sciences, 050905 science studies, Semantics, Computer Science Applications, ComputingMilieux_GENERAL, Feature (linguistics), Embedding, 0509 other social sciences, 050904 information & library sciences, Word (computer architecture), Meaning (linguistics)
Abstract: Patent recommendation aims to recommend patent documents that have similar content to a given target patent. With the explosive growth in patent applications, how to recommend relevant patents from the massive number of patents has become an extremely challenging problem. The main obstacle in patent recommendation is how to distinguish the meanings of the same word in different contexts or associate multiple words that express the same meaning. In this paper, we propose a Heterogeneous Topic model exploiting Word embedding to enhance word semantics (HTW). First, we model the relationship among text, inventors, and applicants around the topic to build a heterogeneous topic model and learn the patent feature representation to capture contextual word semantics. Second, a word embedding is constructed to extract the deep semantics for associating multiple words that express the same meaning. Finally, with words as connections, the mapping from patent feature representations to patent embedding is established through a matrix operation, which integrates the information between the word embedding and patent feature representation. HTW considers the heterogeneity of patents and enhances the distinction or association among words simultaneously. The experimental results on real-world datasets show that HTW exceeds typical keyword-based methods, topic models, and embedding models on patent recommendations.
Published: 2020

26. Knowledge structure transition in library and information science: topic modeling and visualization

Author: Azusa Iwase, Emi Ishita, Michimasa Yamamoto, Yosuke Miyata, Keiko Kurata, and Fang Yang
Subjects: Topic model, Computer science, business.industry, Transition (fiction), General Social Sciences, Library science, Library and Information Sciences, Latent Dirichlet allocation, Field (geography), Computer Science Applications, Visualization, symbols.namesake, Premise, symbols, The Internet, business, Knowledge structure
Abstract: The purpose of this research is to identify topics in library and information science (LIS) using latent Dirichlet allocation (LDA) and to visualize the knowledge structure of the field as consisting of specific topics and its transition from 2000–2002 to 2015–2017. The full text of 1648 research articles from five peer-reviewed representative LIS journals in these two periods was analyzed by using LDA. A total of 30 topics in each period were labeled based on the frequency of terms and the contents of the articles. These topics were plotted on a two-dimensional map usingLDAvisand categorized based on their location and characteristics in the plots. Although research areas in some forms were persistent with which discovered in previous studies, they were crucial to the transition of the knowledge structure in LIS and had the following three features: (1) The Internet became the premise of research in LIS in 2015–2017. (2) Theoretical approach or empirical work can be considered as a factor in the transition of the knowledge structure in some categories. (3) The topic diversity of the five core LIS journals decreased from the 2000–2002 to 2015–2017.
Published: 2020

27. A structural topic model approach to scientific reorientation of economics and chemistry after German reunification

Author: Andreas Rehs
Subjects: Topic model, research field mapping, History, Distribution (economics), Minor (academic), Library and Information Sciences, Wiedervereinigung, topic modelling, 01 natural sciences, German, 010104 statistics & probability, Similarity (psychology), Regional science, Dissertation, 0101 mathematics, Szientometrie, Structure (mathematical logic), dissertations, business.industry, 05 social sciences, Cosine similarity, General Social Sciences, language.human_language, Computer Science Applications, structural topic modelling, language, german reunification, 0509 other social sciences, 050904 information & library sciences, business, Maschinelles Lernen, Period (music)
Abstract: The detection of differences or similarities in large numbers of scientific publications is an open problem in scientometric research. In this paper we therefore develop and apply a machine learning approach based on structural topic modelling in combination with cosine similarity and a linear regression framework in order to identify differences in dissertation titles written at East and West German universities before and after German reunification. German reunification and its surrounding time period is used because it provides a structure with both minor and major differences in research topics that could be detected by our approach. Our dataset is based on dissertation titles in economics and business administration and chemistry from 1980 to 2010. We use university affiliation and year of the dissertation to train a structural topic model and then test the model on a set of unseen dissertation titles. Subsequently, we compare the resulting topic distribution of each title to every other title with cosine similarity. The cosine similarities and the regional and temporal origin of the dissertation titles they come from are then used in a linear regression approach. Our results on research topics in economics and business administration suggest substantial differences between East and West Germany before the reunification and a rapid conformation thereafter. In chemistry we observe minor differences between East and West before the reunification and a slightly increased similarity thereafter.
Published: 2020

28. Dynamics of topic formation and quantitative analysis of hot trends in physical science

Author: Iu. L. Mosenkis, B. G. Kreminskyi, Alexander Yakimenko, and A. V. Chumachenko
Subjects: Metadata, Topic model, Sociology of scientific knowledge, Conceptualization, Computer science, Search engine indexing, Physical science, General Social Sciences, Mutual information, Library and Information Sciences, Citation, Data science, Computer Science Applications
Abstract: Successful research in the face of increasing complexity of modern scientific knowledge together with diversity and depth of the studied problems requires an understanding of the structure and evolution of trends in science. Available digital records open wide possibilities for statistical analysis of scientific publications and related metadata for topic modeling and evolution, knowledge mapping, citation indexing, etc. We investigate dynamical properties of the physical topics using analysis of temporal evolution of proximity measure for word pairs related to the mutual information. We use full-text conceptualization of content of scientific documents provided by the ScienceWISE platform for topic mapping, trend analysis and detection of hot topics together with relevant papers retrieval. We found that time evolution of relative mutual information distance reveals a hidden topic structure and could be used for quantitative analysis of current trends in scientific research.
Published: 2020

29. Two layer-based trajectory analysis of the research trend in automotive fuel industry

Author: Wei Xong, Na Kyeong Lee, Min Song, and Yukyeong Han
Subjects: Topic model, business.product_category, Computer science, business.industry, 05 social sciences, Automotive industry, General Social Sciences, Library and Information Sciences, Bibliometrics, 050905 science studies, Diesel engine, Industrial engineering, Dirichlet distribution, Computer Science Applications, Trend analysis, symbols.namesake, Electric vehicle, Path (graph theory), symbols, 0509 other social sciences, 050904 information & library sciences, business
Abstract: The increasing concern of climate change and unstable oil prices induce the development of technological fuel in automobile industry. To investigate such a rapidly changing path, researchers apply bibliometrics and topic modeling to patent data. These commonly used methods, however, have several drawbacks such as considering macro-level trend only and focusing on high probable terms. To avoid these weaknesses, we propose the two-layer trend analysis based on Time country topic model (TCT) and Dirichlet compound multinomial model (DCM) that enable to detect both macro-level and micro-level trend and identify bursty terms in automotive industry. Experimental results show rising, falling and fluctuating trend topics on condition of countries using TCT model. We also find path of automotive technology based on bursty terms from the analysis of DCM model. Specifically, electric vehicle, aluminum in lightweight material and diesel engine are considered as rising topics in the automobile fuel. Our proposed framework can be applied to analyze the trajectory analysis in various other fields.
Published: 2020

30. Evaluating technological emergence using text analytics: two case technologies and three approaches

Author: Arho Suominen, Stephen Carley, Samira Ranaei, and Alan L. Porter
Subjects: Topic model, Computer science, 05 social sciences, General Social Sciences, Text analytics, 02 engineering and technology, Library and Information Sciences, 021001 nanoscience & nanotechnology, Data science, Latent Dirichlet allocation, Topic modeling, Computer Science Applications, Term (time), Identification (information), symbols.namesake, Research community, 0502 economics and business, Technological emergence, Strategic level, symbols, Emergence score (EScore), 0210 nano-technology, Set (psychology), 050203 business & management
Abstract: Scientometric methods have long been used to identify technological trajectories, but we have seldom seen reproducible methods that allow for the identification of a technological emergence in a set of documents. This study evaluates the use of three different reproducible approaches for identifying the emergence of technological novelties in scientific publications. The selected approaches are term counting technique, the emergence score (EScore) and Latent Dirichlet Allocation (LDA). We found that the methods provide somewhat distinct perspectives on technological. The term count based method identifies detailed emergence patterns. EScore is a complex bibliometric indicator that provides a holistic view of emergence by considering several parameters, namely term frequency, size, and origin of the research community. LDA traces emergence at the thematic level and provides insights on the linkages between emerging research topics. The results suggest that term counting produces results practical for operational purposes, while LDA offers insight at a strategic level.
Published: 2019

31. Application of entity linking to identify research fronts and trends

Author: Mauricio Marrone
Subjects: Topic model, business.industry, Computer science, General Social Sciences, Library and Information Sciences, Data science, Popularity, Information science, Computer Science Applications, Entity linking, Knowledge base, Burstiness, A priori and a posteriori, business, Word (computer architecture)
Abstract: Studying research fronts enables researchers to understand how their academic fields emerged, how they are currently developing and their changes over time. While topic modelling tools help discover themes in documents, they employ a “bag-of-words” approach and require researchers to manually label categories, specify the number of topics a priori, and make assumptions about word distributions in documents. This paper proposes an alternative approach based on entity linking, which links word strings to entities from a knowledge base, to help solve issues associated with “bag-of-words” approaches by automatically identifying topics based on entity mentions. To study topic trends and popularity, we use four indicators—Mann–Kendall’s test, Sen’s slope analysis, z-score values and Kleinberg’s burst detection algorithm. The combination of these indicators helps us understand which topics are particularly active (“hot” topics), which are decreasing (“cold” topics or past “bursty” topics) and which are maturely developed. We apply the approach and indicators to the fields of Information Science and Accounting.
Published: 2019

32. Personal research idea recommendation using research trends and a hierarchical topic model

Author: Yunita Sari, Tzu Ting Hsu, and Hei Chia Wang
Subjects: Topic model, Trend analysis, Tree (data structure), Information retrieval, Correctness, Expression (architecture), Computer science, General Social Sciences, Natural language generation, Library and Information Sciences, Popularity, Computer Science Applications, Task (project management)
Abstract: In the era of rapid technological advance, it is an important task for all researchers to keep up with trends when performing research. How to efficiently find suitable research topics while the number of papers is increasing rapidly is worthwhile to explore. To solve such problems, some researchers attempted to find research ideas by topic detection and tracking methods. However, these methods do not consider the users’ background knowledge and preferences, and they express a topic with general keywords, which does not effectively help researchers to develop new research ideas. Existing studies support that the title expresses the research idea the best. This study adapts this concept to propose an automatic title generation method that combines personalized recommendation methods and topic trend analysis methods to achieve this task. First, it uses hierarchical latent tree analysis to find the users’ interests for a topic structure and its representative keywords hidden in the existing research. Second, the interesting topic trends, popularity and user preferences in a hybrid recommendation method are considered. Finally, a natural language generation algorithm that is suitable for the titles of academic papers converts the original recommended-keywords into fluent title sentences that are designed for the users. Experiments have found that adding Google Trend indicators and personal factors can improve the performance of topic recommendations. The automatic title generation method using template-based and statistical information methods leads to excellent performances in both grammatical correctness and semantic expression. Moreover, for the users, the title is indeed more inspirational than the simple keywords for users to develop new research ideas.
Published: 2019

33. Mapping of topics in DESIDOC Journal of Library and Information Technology, India: a study

Author: Manika Lamba and Margam Madhusudhan
Subjects: Topic model, History, business.industry, 05 social sciences, Online exhibition, General Social Sciences, Information technology, Library and Information Sciences, Bibliometrics, 050905 science studies, Latent Dirichlet allocation, Field (geography), Computer Science Applications, World Wide Web, symbols.namesake, Information and Communications Technology, symbols, Library automation, 0509 other social sciences, 050904 information & library sciences, business
Abstract: This study analyzed 928 full-text research articles retrieved from DESIDOC Journal of Library and Information Technology for the period of 1981–2018 using Latent Dirichlet Allocation. The study further tagged the articles with the modeled topics. 50 core topics were identified throughout the period of 38 years whereas only 26 topics were unique in nature. Bibliometrics, ICT, information retrieval, and user studies were highly researched areas in India for the epoch. Further, Spain and Taiwan showed common research trends and areas as India whereas India has quite distinct research interests from America and China. Therefore, researchers in Library and Information Science in India should pay more attention to the topics which are under-researched. Further, it was found that there were some unique sub-fields to Indian Library and Information Science research, such as open access; online exhibition; virtual libraries; multimedia libraries; open source software; library automation; and library management system. With the passage of time topics evolve over time, new topics emerge, and old ones become obsolete. Topic modeling not only helps the researcher to determine the trending themes or related fields with respect to their field of interest but also helps them to identify new concepts and fields over time.
Published: 2019

34. A novel method to identify emerging technologies using a semi-supervised topic clustering model: a case of 3D printing industry

Author: Yufei Liu, Yuan Zhou, Heng Lin, and Wei Ding
Subjects: Topic model, Information retrieval, Computer science, Emerging technologies, Semantic interpretation, General Social Sciences, Library and Information Sciences, Mixture model, computer.software_genre, Dirichlet distribution, Computer Science Applications, Information extraction, symbols.namesake, symbols, Domain knowledge, Cluster analysis, computer
Abstract: There have been recent attempts to identify emerging technologies by using topic-based analysis, but many of them have methodological deficiencies. First, analyses are unsupervised, and unsupervised methods cannot incorporate supervised knowledge that is needed to better identify technological domains. Second, those methods lack semantic interpretation, as many of them still remain at word-level analyses, we developed a novel technology-identification method that uses a semi-supervised topic clustering model (Labeled Dirichlet Multi Mixture model) to integrate technological domain knowledge. The model also generates a sentence-level semantic technological topic description through the topic description method (Various-aspects Sentence-level Description) on information extraction. We used this novel method to analyze the technology of the 3D printing industry, and successfully identified emerging technologies by differentiating new topics from the traditional topics, the results effectively demonstrated the semantic technological topic description by showing sentences. This method could be of great interest to technology forecasters and relevant policy-makers.
Published: 2019

35. Investigating technology opportunities: the use of SAOx analysis

Author: Sungjoo Lee, Kyeong-Min Park, and Kyuwoong Kim
Subjects: Topic model, Gerund, Computer science, 05 social sciences, General Social Sciences, Library and Information Sciences, Information accuracy, 050905 science studies, Object (computer science), Research findings, Data science, Computer Science Applications, law.invention, Patent analysis, law, TRIZ, 0509 other social sciences, 050904 information & library sciences
Abstract: A patent is regarded as one of the most reliable data sources to investigate such opportunities and has been analyzed in numerous ways. The recent trend of patent analysis has focused on the unstructured part of patent information to extract detailed technological information. In particular, information regarding the purpose or effect of technology, which can be pulled from the unstructured part of patent information, is expected to offer useful insights into expanding its application to other areas. Some previous attempts have been made to systematically use this information to identify new technology opportunities, partly due to difficulties in analyzing the unstructured text data in patent documents. To overcome the limitations of previous studies, this study aims to develop a new method, namely Subject–Action–Object–others (SAOx), which enables an in-depth examination of the purpose and effect of the technology in an efficient manner by analyzing “for” and “to” phrases as well as gerund forms for an object element. We also introduce 39 engineering parameters of TRIZ and technology-designative terms of patent documents to define SAO sets and improve information accuracy. The proposed method is applied to human–machine interaction technologies to understand technology trends and explore technology opportunities based on topic modeling. Methodologically, the research findings contribute to patent engineering by extending the range of information extracted from patent information. Practically, the proposed approach will support corporate decision making in R&D investment by providing comprehensive information regarding the purpose or effect of technology in a structured form, fully extracted from patent documents.
Published: 2018

36. A qualitative and quantitative analysis of open citations to retracted articles: the Wakefield 1998 et al.'s case

Author: Silvio Peroni, Ivan Heibi, Heibi I., and Peroni S.
Subjects: Topic model, General Social Sciences, Library and Information Sciences, Article, Computer Science Applications, Epistemology, Retraction, Topic modeling, Science of Science, Qualitative analysis, Quantitative analysis (finance), Citation analysis, Citation analysi, Psychology
Abstract: In this article, we show the results of a quantitative and qualitative analysis of open citations on a popular and highly cited retracted paper: “Ileal-lymphoid-nodular hyperplasia, non-specific colitis and pervasive developmental disorder in children” by Wakefield et al., published in 1998. The main purpose of our study is to understand the behavior of the publications citing one retracted article and the characteristics of the citations the retracted article accumulated over time. Our analysis is based on a methodology which illustrates how we gathered the data, extracted the topics of the citing articles and visualized the results. The data and services used are all open and free to foster the reproducibility of the analysis. The outcomes concerned the analysis of the entities citing Wakefield et al.’s article and their related in-text citations. We observed a constant increasing number of citations in the last 20 years, accompanied with a constant increment in the percentage of those acknowledging its retraction. Citing articles have started either discussing or dealing with the retraction of Wakefield et al.’s article even before its full retraction happened in 2010. Articles in the social sciences domain citing the Wakefield et al.’s one were among those that have mostly discussed its retraction. In addition, when observing the in-text citations, we noticed that a large number of the citations received by Wakefield et al.’s article has focused on general discussions without recalling strictly medical details, especially after the full retraction. Medical studies did not hesitate in acknowledging the retraction of the Wakefield et al.’s article and often provided strong negative statements on it.
Published: 2020

37. Empirical study of constructing a knowledge organization system of patent documents using topic modeling.

Author: Hu, Zhengyin, Fang, Shu, and Liang, Tian
Abstract: A knowledge organization system (KOS) can help easily indicate the deep knowledge structure of a patent document set. Compared to classification code systems, a personalized KOS made up of topics can represent the technology information in a more agile, detailed manner. This paper presents an approach to automatically construct a KOS of patent documents based on term clumping, Latent Dirichlet Allocation (LDA) model, K-Means clustering and Principal Components Analysis (PCA). Term clumping is adopted to generate a better bag-of-words for topic modeling and LDA model is applied to generate raw topics. Then by iteratively using K-Means clustering and PCA on the document set and topics matrix, we generated new upper topics and computed the relationships between topics to construct a KOS. Finally, documents are mapped to the KOS. The nodes of the KOS are topics which are represented by terms and their weights and the leaves are patent documents. We evaluated the approach with a set of Large Aperture Optical Elements (LAOE) patent documents as an empirical study and constructed the LAOE KOS. The method used discovered the deep semantic relationships between the topics and helped better describe the technology themes of LAOE. Based on the KOS, two types of applications were implemented: the automatic classification of patents documents and the categorical refinements above search results. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

38. A topic model approach to measuring interdisciplinarity at the National Science Foundation.

Author: Nichols, Leah
Abstract: As the National Science Foundation (NSF) implements new cross-cutting initiatives and programs, interest in assessing the success of these experiments in fostering interdisciplinarity grows. A primary challenge in measuring interdisciplinarity is identifying and bounding the discrete disciplines that comprise interdisciplinary work. Using statistical text-mining techniques to extract topic bins, the NSF recently developed a topic map of all of their awards issued between 2000 and 2011. These new data provide a novel means for measuring interdisciplinarity by assessing the language or content of award proposals. Using the Directorate for Social, Behavioral, and Economic Sciences as a case study and drawing on the new topic model of the NSF's awards, this paper explores new methods for quantifying interdisciplinarity in the NSF portfolio. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

39. Topic based research competitiveness evaluation

Author: Ruinan Li, Tingcan Ma, Mingliang Yue, and Guiyan Ou
Subjects: Topic model, Knowledge management, business.industry, Computer science, media_common.quotation_subject, 05 social sciences, General Social Sciences, Distribution (economics), Library and Information Sciences, 050905 science studies, Field (computer science), Computer Science Applications, Distribution matrix, Evaluation methods, Institution, 0509 other social sciences, 050904 information & library sciences, business, Analysis method, media_common
Abstract: Research competitiveness analysis refers to the measurement, comparison and analysis of the research status (i.e., strength and/or weakness) of different scientific research bodies (e.g., institutions, researchers, etc.) in different research fields. Improving research competitiveness analysis method can be conducive to accurately obtaining the research status of research fields and research bodies. This paper presents a method of evaluating the competitiveness of research institutions based on research topic distribution. The method uses the LDA topic model to obtain a paper-topic distribution matrix to objectively assign the academic impact of papers (such as number of citations) to research topics. Then the method calculates the competitiveness of each research institution on each research topic with the help of an institution-paper matrix. Finally, the competitiveness and the research strength and/or weakness of the institutions are defined and characterized. A case study shows that the method can lead to an objective and effective evaluation of the research competitiveness of research institutions in a given research field.
Published: 2018

40. Collective topical PageRank: a model to evaluate the topic-dependent academic impact of scientific papers

Author: Yongjun Zhang, Jialin Ma, Yongtao Yu, Bolun Chen, and Zijian Wang
Subjects: Topic model, Computer science, media_common.quotation_subject, General Social Sciences, 02 engineering and technology, Area of interest, Library and Information Sciences, Data science, Computer Science Applications, law.invention, PageRank, law, 020204 information systems, Reading (process), Academic writing, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Pagerank algorithm, media_common
Abstract: With the explosive growth of academic writing, it is difficult for researchers to find significant papers in their area of interest. In this paper, we propose a pipeline model, named collective topical PageRank, to evaluate the topic-dependent impact of scientific papers. First, we fit the model to a correlation topic model based on the textual content of papers to extract scientific topics and correlations. Then, we present a modified PageRank algorithm, which incorporates the venue, the correlations of the scientific topics, and the publication year of each paper into a random walk to evaluate the paper’s topic-dependent academic impact. Our experiments showed that the model can effectively identify significant papers as well as venues for each scientific topic, recommend papers for further reading or citing, explore the evolution of scientific topics, and calculate the venues’ dynamic topic-dependent academic impact.
Published: 2017

41. Does academic collaboration equally benefit impact of research across topics? The case of agricultural, resource, environmental and ecological economics

Author: Maksym Polyakov, Serhiy Polyakov, and Sayed Iftekhar
Subjects: Topic model, Ecological economics, Latent semantic analysis, business.industry, media_common.quotation_subject, 05 social sciences, Control (management), General Social Sciences, Library and Information Sciences, Environmental economics, Computer Science Applications, Resource (project management), Agriculture, Political science, 0502 economics and business, Institution, Regional science, 050202 agricultural economics & policy, 0509 other social sciences, 050904 information & library sciences, business, Citation, media_common
Abstract: In this paper, we analyse the effects of different types of formal collaboration and research topics on research impact of academic articles in the area of agricultural, resource, environmental, and ecological economics. The research impact is measured by the number of times an article has been cited each year since publication. The topics within the area of research are modelled using latent semantic analysis. We distinguish between the effect of institutional, national, and international collaboration. We use statistical models for count data and control for the impacts of journals, publication year, and years since publication. We find that, holding other factors constant, collaboration in the form of co-authorship increases research impact. The effect of inter-institutional collaboration within same country is similar to the effect of collaboration within same institution. However, international collaboration results in additional increase in impact. We find that the topic of a paper substantially influences number of citations and identified which topics are associated with greater impact. The effects of different types of collaboration on citations also vary across topics.
Published: 2017

42. Mapping the research on aquaculture. A bibliometric analysis of aquaculture literature.

Author: Natale, Fabrizio, Fiore, Gianluca, and Hofherr, Johann
Abstract: Research on aquaculture is expanding along with the exceptional growth of the sector and has an important role in supporting even further the future developments of this relatively young food production industry. In this paper we examined the aquaculture literature using bibliometrics and computational semantics methods (latent semantic analysis, topic model and co-citation analysis) to identify the main themes and trends in research. We analysed bibliographic information and abstracts of 14,308 scientific articles on aquaculture recorded in Scopus. Both the latent semantic analysis and the topic model indicate that the broad themes of research on aquaculture are related to genetics and reproduction, growth and physiology, farming systems and environment, nutrition, water quality, and health. The topic model gives an estimate of the relevance of these research themes by single articles, authors, research institutions, species and time. With the co-citation analysis it was possible to identify more specific research fronts, which are attracting high number of co-citations by the scientific community. The largest research fronts are related to probiotics, benthic sediments, genomics, integrated aquaculture and water treatment. In terms of temporal evolution, some research fronts such as probiotics, genomics, sea-lice, and environmental impacts from cage aquaculture, are still expanding while others, such as mangroves and shrimp farming, benthic sediments, are gradually losing weight. While bibliometric methods do not necessarily provide a measure of output or impact of research activities, they proved useful for mapping a research area, identifying the relevance of themes in the scientific literature and understanding how research fronts evolve and interact. By using different methodological approaches the study is taking advantage of the strengths of each method in mapping the research on aquaculture and showing in the meantime possible limitations and some directions for further improvements. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

43. Exploring dynamic research interest and academic influence for scientific collaborator recommendation

Author: Teshome Megersa Bekele, Xiangjie Kong, Zhenzhen Xu, Wei Wang, Meng Wang, and Huizhen Jiang
Subjects: Topic model, Computer science, Social impact, General Social Sciences, 02 engineering and technology, Variation (game tree), Library and Information Sciences, Data science, Computer Science Applications, Time function, Distribution matrix, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, F1 score, Cluster analysis, Baseline (configuration management)
Abstract: In many cases, it is time-consuming for researchers to find proper collaborators who can provide researching guidance besides simply collaborating. The Most Beneficial Collaborators (MBCs), who have high academic level and relevant research topics, can genuinely help researchers to enrich their research. However, how can we find the MBCs? In this paper, we propose a most Beneficial Collaborator Recommendation model called BCR. BCR learns on researchers’ publications and associates three academic features: topic distribution of research interest, interest variation with time and researchers’ impact in collaborators network. We run a topic model on researchers’ publications in each year for topic clustering. The generated topic distribution matrix is fixed by a time function to fit the interest dynamic transformation. The academic social impact is also considered to mine the most prolific researchers. Finally, a TopN MBCs recommendation list is generated according to the computed score. Extensive experiments on a dataset with citation network demonstrate that, in comparison to relevant baseline approaches, our BCR performs better in terms of precision, recall, F1 score and the recommendation quality.
Published: 2017

44. Mapping the evolution of library and information science (1978–2014) using topic modeling on LISA

Author: Maria Pinto, Carlos G. Figuerola, and Francisco Javier García Marco
Subjects: Topic model, Information retrieval, Computer science, business.industry, 05 social sciences, Perspective (graphical), General Social Sciences, Information technology, Library science, Context (language use), Library and Information Sciences, 050905 science studies, Data science, Latent Dirichlet allocation, Computer Science Applications, Domain (software engineering), symbols.namesake, Multidisciplinary approach, symbols, 0509 other social sciences, 050904 information & library sciences, business
Abstract: This paper offers an overview of the bibliometric study of the domain of library and information science (LIS), with the aim of giving a multidisciplinary perspective of the topical boundaries and the main areas and research tendencies. Based on a retrospective and selective search, we have obtained the bibliographical references (title and abstract) of academic production on LIS in the database LISA in the period 1978---2014, which runs to 92,705 documents. In the context of the statistical technique of topic modeling, we apply latent Dirichlet allocation, in order to identify the main topics and categories in the corpus of documents analyzed. The quantitative results reveal the existence of 19 important topics, which can be grouped together into four main areas: processes, information technology, library and specific areas of information application.
Published: 2017

45. Mapping a Twitter scholarly communication network: a case of the association of internet researchers’ conference

Author: Han Woo Park, Ho Young Yoon, Mikyung Lee, Hyejin Park, and Marc A. Smith
Subjects: Topic model, Data collection, Computer science, business.industry, 05 social sciences, Social network analysis (criminology), General Social Sciences, Library and Information Sciences, 050905 science studies, Telecommunications network, Scholarly communication, Computer Science Applications, World Wide Web, Content analysis, Information source, The Internet, 0509 other social sciences, 050904 information & library sciences, business
Abstract: This paper investigates how scholars in the digital humanities use Twitter for informal scholarly communication. In particular, the paper observes the hosting of an annual conference over a number of years by one association in order to see whether there was a change in the network configuration structure, the influential scholars in the network, the information sources, and the tweet contents. Annual conferences held by the Association of Internet Researchers over 3 years are used for data collection. According to our result, while the Twitter communication network developed into a bigger network, the basic form of the network configuration remained stable as a Tight Crowd structure and the core influential people were not much changed. Analyses on information source and content found topic changes in each year but consistency in the kind of information source and content.
Published: 2017

46. Subject–method topic network analysis in communication studies

Author: Hyojung Jung, Min Song, and Keeheon Lee
Subjects: Topic model, Relation (database), Computer science, 05 social sciences, Citation index, Communication studies, Social change, General Social Sciences, 050801 communication & media studies, Library and Information Sciences, 050905 science studies, Data science, Computer Science Applications, 0508 media and communications, Information and Communications Technology, 0509 other social sciences, Citation, Network analysis
Abstract: Communication studies depend on information and communication technology (ICT) and the behavior of people using the technology. ICT enables individuals to transfer information quickly via various media. Social changes are occurring rapidly and their studies are growing in number. Thus, a tool to extract knowledge to comprehend the quickly changing dynamics of communication studies is required. We propose a subject---method topic network analysis method that integrates topic modeling analysis and network analysis to understand the state of communication studies. Our analysis focuses on the relationships between topics classified as subjects and methods. From the relationships, we examine the societal and perspective changes relative to emerging media technologies. We apply our method to all papers listed in the Journal Citation Reports Social Science Citation Index as communication studies between 1990 and 2014. The study results allow us to identify popular subjects, methods, and subject---method pairs in proportion and relation.
Published: 2016

47. Knowledge in motion: the evolution of HIV/AIDS research

Author: jimi adams and Ryan Light
Subjects: Topic model, 050402 sociology, Management science, Cost effectiveness, HIV/AIDS research, 05 social sciences, General Social Sciences, Library and Information Sciences, 050905 science studies, medicine.disease, Motion (physics), Computer Science Applications, 0504 sociology, Acquired immunodeficiency syndrome (AIDS), Multidisciplinary approach, Transdisciplinarity, medicine, Engineering ethics, Computational sociology, Sociology, 0509 other social sciences
Abstract: Many contemporary social and public health problems do not fit neatly into the research fields typically found in universities. With this in mind, researchers and funding agencies have devoted increasing attention to projects that span multiple disciplines. However, comparatively little attention has been paid to how these projects evolve over time. This relative neglect is in part attributable to a lack of theory on the dynamic nature of such projects. In this paper, we describe how research programs can move through various states of integration including disciplinarity, multidisciplinarity, interdisciplinarity and transdisciplinarity. We link this insight to computational techniques--topic models--to explore one of the most vibrant and pressing contemporary research areas--research on HIV/AIDS. Topic models of over 9000 abstracts from two prominent journals illustrate how research on HIV/AIDS has evolved from a high to a lower level of integration. The topic models motivate a more detailed historical analysis of HIV/AIDS research and, together, they highlight the dynamic nature of knowledge production. We conclude by discussing the role of computational social science in dynamic models of interdisciplinarity.
Published: 2016

48. Topic-based heterogeneous rank

Author: Jian Xu, Ali Daud, Vincent Malic, Ying Ding, and Tehmina Amjad
Subjects: Topic model, Information retrieval, Computer science, Rank (computer programming), General Social Sciences, Library and Information Sciences, Data science, Field (computer science), Computer Science Applications, Ranking (information retrieval), Domain (software engineering), Test (assessment), Metric (mathematics), Heterogeneous network
Abstract: Topic-based ranking of authors, papers and journals can serve as a vital tool for identifying authorities of a given topic within a particular domain. Existing methods that measure topic-based scholarly output are limited to homogeneous networks. This study proposes a new informative metric called Topic-based Heterogeneous Rank (TH Rank) which measures the impact of a scholarly entity with respect to a given topic in a heterogeneous scholarly network containing authors, papers and journals. TH Rank calculates topic-dependent ranks for authors by considering the combined impact of the multiple factors which contribute to an author's level of prestige. Information retrieval serves as the test field and articles about information retrieval published between 1956 and 2014 were extracted from web of science. Initial results show that TH Rank can effectively identify the most prestigious authors, papers and journals related to a specific topic.
Published: 2015

49. A decade of research in statistics: a topic model approach

Author: Alfio Ferrara, Francesca De Battisti, and Silvia Salini
Subjects: Metadata, Topic model, Information retrieval, Computer science, Selection (linguistics), General Social Sciences, Context (language use), Library and Information Sciences, Scientometrics, Cluster analysis, Data science, Field (computer science), Computer Science Applications
Abstract: Topic models are a well known clustering approach for textual data, which provides promising applications in the bibliometric context for the purpose of discovering scientific topics and trends in a corpus of scientific publications. However, topic models per se provide poorly descriptive metadata featuring the discovered clusters of publications and they are not related to the other important metadata usually available with publications, such as authors affiliation, publication venue, and publication year. In this paper, we propose a methodological approach to topic modeling and post-processing of topic models results to the end of describing in depth a field of research over time. In particular, we work on a selection of publications from the international statistical literature, we propose an approach that allows us to identify sophisticated topic descriptors, and we analyze the links between topics and their temporal evolution.
Published: 2015

50. Analyzing knowledge flows of scientific literature through semantic links: a case study in the field of energy

Author: Saeed-Ul Hassan and Peter Haddawy
Subjects: Topic model, Computer science, Scopus, General Social Sciences, Scientific literature, Library and Information Sciences, Bibliometrics, Data science, Latent Dirichlet allocation, Field (computer science), Computer Science Applications, Identification (information), symbols.namesake, symbols, Semantic analysis (knowledge representation)
Abstract: In this paper we propose a new technique to semantically analyze knowledge flows across countries by using publication and citation data. We start with the identification of research topics produced by a given source country. Then, we collect papers, published by the authors outside the source country, citing the identified research topics. At last, we group each set of citing papers separately to determine the scholarly impact of the actual identified research topics in the cited topics. The research topics are identified using our proposed topic model with distance matrix, an extension of classic Latent Dirichlet Allocation model. We also present a case study to illustrate the use of our proposed techniques in the subject area Energy during 2004---2009 using the Scopus database. We compare the Japanese and Chinese papers that cite the scientific literature produced by the researchers from the United States in order to show the difference in the use of same knowledge. The results indicate that Japanese researchers focus in the research areas such as efficient use of Photovoltaic, Energy Conversion and Superconductors (to produce low-cost renewable energy). In contrast with the Japanese researchers, Chinese researchers focus in the areas of Power Systems, Power Grids and Solar Cells production. Such analyses are useful for understanding the dynamics of the relevant knowledge flows across the nations.
Published: 2015

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

59 results on '"Topic model"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources