131 results on '"Topic model"'
Search Results
2. Financial Statement Adequacy and Firms’ MD&A Disclosures
- Author
-
Jenny Wu Tucker, Lisa A. Hinson, and Stephen V. Brown
- Subjects
Topic model ,History ,Polymers and Plastics ,Earnings ,business.industry ,Text segmentation ,Accounting ,Industrial and Manufacturing Engineering ,Proxy (climate) ,Value (economics) ,Relevance (information retrieval) ,Business ,Business and International Management ,Book value ,Financial statement - Abstract
Firms are required to provide financial information via the financial statements and the MD&A—a narrative explanation of the financial statements. Our study examines how firms use the MD&A channel when their financial statement channel is inadequate. We proxy for the adequacy of the financial statement channel by the value relevance of book value and earnings. We use several approaches to extract MD&A disclosure attributes: (1) keyword searches to identify non-GAAP disclosure, (2) supervised deep learning models to identify forward-looking statements, and (3) unsupervised topic models as well as text segmentation techniques to identify topics and topic locations. We find that firms with lower value relevance of financial statements (1) are more likely to provide non-GAAP disclosure in the MD&A, (2) include more forward-looking statements in the MD&A, and (3) use larger proportions of the MD&A to discuss intangibles and discuss them more prominently. Our findings suggest that managers use the MD&A, a relatively more flexible channel, to a greater extent to provide information when their financial statement channel is less adequate.
- Published
- 2021
3. Why Do Managers Disclose Risks Accurately? Textual Analysis, Disclosures, and Risk Exposures
- Author
-
Alejandro Lopez-Lira
- Subjects
Topic model ,Actuarial science ,Computer science ,Economic model ,Space (commercial competition) ,Set (psychology) ,Measure (mathematics) - Abstract
I provide an economic model that justifies using bag-of-words, topic modeling, and machine learning techniques to measure firms' risk exposures using the percentage they allocate to each risk in their financial statements. The model provides a theoretical set of sufficient conditions under minimal assumptions that make managers optimally disclose risk accurately and give more space to the most critical risks. I document that the SEC Regulation S-K satisfies this set of sufficient theoretical conditions and induces rational managers to disclose risks truthfully.
- Published
- 2021
4. Information-Seeking Argument Mining: A Step Towards Identifying Reasons in Textual Analysis to Improve Services
- Author
-
Marcus Dombois, Iryna Gurevych, Bernd Skiera, Shunyao Yan, and Johannes Daxenberger
- Subjects
Service (business) ,Topic model ,Argumentative ,Empirical research ,Information seeking ,Computer science ,Sentiment analysis ,Unstructured data ,Service provider ,Data science - Abstract
Service providers increasingly use textual analysis such as sentiment mining or topic models on unstructured data. Still, those techniques fall short when providing linguistic relations such as reasons behind changes in sentiment or topics. Information-seeking argument mining (IS AM) is a text mining technique that automatically extracts and identifies the argumentative structures (e.g., reasoning) from natural language text. So far, however, service researchers and managers hardly use IS AM. This article outlines how to use IS AM to improve services. The empirical study applies IS AM to news articles about scooter-sharing systems, i.e., a service enabling the short-term rentals of electric motorized scooters. The results outline that (i) arguments differ strongly across time, providers of scooter-sharing systems, and media, (ii) knowledge of arguments enable to improve services and communications with customers, and (iii) results from sentiment analysis support the validity of IS AM. The article closes with an outlook for further research.
- Published
- 2021
5. Disperse and Preserve the Perverse: Computing How Hip-Hop Censorship Changed Popular Music Production in China
- Author
-
Ke Nie
- Subjects
Topic model ,geography ,History ,geography.geographical_feature_category ,Field (Bourdieu) ,media_common.quotation_subject ,Media studies ,Censorship ,Lyrics ,Popular music ,Music information retrieval ,China ,Sound (geography) ,media_common - Abstract
How do states censor an artistic genre that challenges the social norms? The answers to this question in the existing literature are mostly based on investigations of particular artists or artworks of the genre, while the impact on the entire genre field remains unchecked. This paper explores how censorship of an artistic genre changes artistic forms of that genre while also triggering strategic reactions from the artists of closely related genres. Using an original dataset of 53,364 songs released on a Chinese online music platform, I study the impact of Hip-Hop censorship in China in 2018 on Hip-Hop as well as Pop, Rock, and Folk songs in terms of how they sound and what topics they write about in the lyrics. I propose a novel approach to measuring sound similarities and topic prevalence in song lyrics using computational tools including Music Information Retrieval (MIR), neural networks, and topic models. I found that Hip-Hop songs produced after the censorship sound significantly different from those before, with a bigger impact on songs that were played frequently on the platform than those that were not. Moreover, genres of similarly restricted production, such as Rock, sound more like Hip-Hop after the censorship, while genres of large-scale production, such as Pop, sound less “Hip-Hopy”. The censorship also made Hip-Hop musicians engage less with topics related to violence, smoking, or drinking but more with sexual content in a covered form. The findings thus suggest a model of dispersion in explaining the mechanism of censorship in cultural production.
- Published
- 2021
6. Infodemiology: Computational Methodologies for Quantifying and Visualizing Key Characteristics of the COVID-19 Infodemic
- Author
-
Ligot D, Brennan-Rieder D, Tayco Fc, Nazareno C, and Toledo M
- Subjects
Topic model ,History ,Polymers and Plastics ,Computer science ,Climate change denial ,Data science ,Industrial and Manufacturing Engineering ,Infodemiology ,Identification (information) ,Similarity (psychology) ,Disinformation ,Social media ,Misinformation ,Business and International Management - Abstract
Objectives. Infodemics of false information on social media is a growing societal problem, aggravated by the occurrence of the COVID-19 pandemic. The development of infodemics has characteristic resemblances to epidemics of infectious diseases. This paper presents several methodologies which aim to measure the extent and development of infodemics through the lens of epidemiology. Methods. Time varying R was used as a measure for the infectiousness of the infodemic, topic modeling was used to create topic clouds and topic similarity heat maps, while network analysis was used to create directed and undirected graphs to identify super-spreader and multiple carrier communities on social media. Results. Forty-two (42) latent topics were discovered. Reproductive trends for a specific topic were observed to have significantly higher peaks (Rt 4-5) than general misinformation (Rt 1-3). From a sample of social media misinformation posts, a total of 385 groups and 804 connections were found within the network, with the largest group having 1,643 shares and 1,063,579 interactions over a 12 month period. Conclusions. These approaches enable the measurement of the infectiousness of an infodemic, comparative analysis of infodemic topics, and identification of likely super-spreaders and multiple carriers on social media. The results of these analyses can form the basis for taking action to stem an ongoing spread of misinformation on social media and mitigate against future infodemics. The methods are not confined to health misinformation and may be applied to other infodemics, such as conspiracy theories, political disinformation, and climate change denial.
- Published
- 2021
7. Social Determinants of Health: Insights from Location Big Data
- Author
-
Natasha Zhang Foutz, Beibei Li, and Meghanath Macha
- Subjects
Gerontology ,Topic model ,History ,education.field_of_study ,Polymers and Plastics ,business.industry ,Population ,Big data ,Metropolitan area ,Industrial and Manufacturing Engineering ,Identification (information) ,Health care ,Social determinants of health ,Business and International Management ,business ,Association (psychology) ,Psychology ,education - Abstract
Rocketing hospitalization rates and costs call for a deeper understanding of the connection between health outcomes and individuals' social determinants of health, such as lifestyle and socio-economics. Such knowledge holds important implications for health marketing, policymaking, and policy communication. Building on the literature that has focused on either lifestyle identification or the association between health outcomes and limited behavioral features derived from small samples, we propose a novel framework leveraging the population-scale location data that capture granular individual behavior 24/7. This framework integrates unsupervised topic models and sequential deep learning models to characterize individual lifestyles and quantify their association with future hospitalizations while integrating other social determinants of health. Applied to 45 million location records from a major metropolitan region in the U.S., the framework successfully uncovers heterogeneous lifestyles. Several key findings then emerge. An individual's lifestyle choice turns out to be a more critical predictor of future hospitalization than his/her socio-economic factors or accessibility to healthcare facilities. Lower-income populations can present healthy lifestyles, while high-income populations can present unhealthy ones. Population with lower accessibility to healthcare or facilities can present healthy lifestyles; while a population with higher accessibility can present unhealthy ones. Individuals with busy, varying work routines and limited gym visits are 2.01 times likely to be hospitalized within a year, compared to the population average. Importantly, regularity, rather than total time spent, toward healthy or unhealthy activities, predict future hospitalization. Overall, an individual's lifestyle choice is more critical than the socio-economic, accessibility, and their community factors, consistent with a recent review on social determinants in EHR.
- Published
- 2021
8. A Thousand Words Tell More Than Just Numbers: Financial Crises and Historical Headlines
- Author
-
Tomi Roukka, Henri Nyberg, and Kim Ristolainen
- Subjects
Finance ,Topic model ,History ,Polymers and Plastics ,business.industry ,Empirical modelling ,Industrial and Manufacturing Engineering ,Newspaper ,Economic indicator ,Financial crisis ,Narrative ,Business and International Management ,Empirical evidence ,business - Abstract
We show that financial crises are preceded by changes in specific types of narrative information contained in newspaper article titles. Our novel international dataset and the resulting empirical evidence are gathered by integrating information from a large panel of economic news articles in global newspapers between the years 1870 and 2016 with conventional macroeconomic and financial indicators. We find that the predictive information of newspaper article titles that signals coming crisis episodes is substantial over and above the macroeconomic and financial indicators. The new indicators capture common features that have often been discussed as potential causes of specific crises but which have not been incorporated into empirical models.
- Published
- 2021
9. Wisdom from Words: Using Natural Language Processing to Study Culture (and Psychology More Broadly)
- Author
-
Grant Packard and Jonah Berger
- Subjects
Topic model ,Psychological science ,business.industry ,computer.software_genre ,Dual role ,Cultural dynamics ,Artificial intelligence ,business ,Psychology ,computer ,Natural language processing ,Natural language ,Digitization ,Range (computer programming) - Abstract
Why do some cultural items (e.g., songs, stereotypes, and ideas) catch on while others fail? While researchers have long been interested in cultural dynamics and cultural success, quantification has been challenging. It’s difficult to get data over time or to extract psychological features of cultural items that might contribute to their success. We suggest that natural language processing can help. The digitization of information has made more textual data available, and the emergence of new tools provide novel ways to extract behavioral insight from text. We discuss the dual role that language serves, review some useful approaches that psychologists may be less familiar with (e.g., topic modeling and embeddings), and describe how these approaches can help unlock a range of novel questions. By focusing on culture, we highlight a specific place natural language tools can be applied, while also illuminating how they can address a broader range of psychological science topics.
- Published
- 2021
10. Pattern Making and Pattern Breaking: Measuring Novelty in Brazilian Economics
- Author
-
Bernardo Mueller and Marcos Paulo R. Correia
- Subjects
Topic model ,Incentive ,Kullback–Leibler divergence ,Selection (linguistics) ,Novelty ,Contrast (statistics) ,Metric (unit) ,Sociology ,Sociocultural evolution ,Data science - Abstract
How do new ideas emerge in academic contexts and what forces determine which ideas get selected and which are forgotten? We analyze more than 1,600 papers presented at the ANPEC Brazilian Economics Meetings from 2013 to 2019 using topic modeling and relative entropy measures. In contrast to simply counting citations or reference combinations, these methods explore the information in the actual texts to detect the rise of new patterns and whether these patterns persist once they have been established. We find that novelty is highly correlated with transience so that most new ideas are quickly forgotten. However, of the ideas that persist those that are more novel have a higher impact. We show that our text-based measure of impact is correlated with subsequent citations. Our results provide a metric to compare the nature of research at the level of Brazilian Economics departments as well as for individual researchers. Finally, we analyze how the selection procedures for the ANPEC meetings affect the incentives for economists to pursue more novel or conventional research.
- Published
- 2021
11. An email content-based insider threat detection model using anomaly detection algorithms
- Author
-
Chai Dakun Mang, NaanKang Garba, Sandip Rakshit, and Narasimha Rao Vajjhala
- Subjects
Topic model ,Focus (computing) ,Computer science ,Anomaly (natural sciences) ,Baseline model ,Insider threat ,Anomaly detection ,Algorithm ,Natural language ,Insider - Abstract
Insider threats significantly impact businesses as well as governments and military organizations. The focus of threats has shifted from external attack to within organizations where authorized users have become potential insider threats. Existing insider threats detection methods, such as the rule-based approach rely on expert knowledge making it not robust. An insider threat detection method is proposed based on email user behaviour and anomaly detection algorithms to overcome this limitation. An email content based on the IT administrator role is constructed from the CERT r6.2 dataset using natural language pre-processing modules. Topic modelling is performed on the dataset to generate a vector space, which serves as input to anomaly detection algorithms to detect malicious email contents. The experimental results demonstrate that the proposed model has an 89% detection rate over the baseline model. A combination of K-means and PCA anomaly detection algorithms yielded a good detection rate of 89% for 1%, 5%, 10%, 15%, 20%, 25%, and 30% cut-off values anomaly scores.
- Published
- 2021
12. Performance Analysis of Abstract based Classification of Medical Journals using Ensemble Methods
- Author
-
Radha N and Deepika A
- Subjects
Topic model ,ComputingMethodologies_PATTERNRECOGNITION ,Boosting (machine learning) ,Information retrieval ,Computer science ,Decision tree learning ,Decision tree ,Scientific literature ,Gradient boosting ,Ensemble learning ,Random forest - Abstract
Fetching the suitable web-based literature is one of the leading issues faced by the researchers now-a-days. This is due to the increase in number of articles with lack of appropriate keyword searching procedures. Specifically, in the field of biomedical research, the scientific literature repositories are growing exponentially. So, getting the appropriate and relevant documents from the vast amount of biomedical literatures is one of the key issues. To tackle the issue this research work proposed a model that uses topic modelling algorithm as the prior step before the implementation of ensemble algorithms to efficiently do the automatic labeling and classification respectively. National Center for Biotechnology Information’s (NCBI) - PubMed is the biggest source of peer-reviewed biological literatures for researchers and health practitioners in the field of biomedical. The dataset is collected from the NCBI’s Pubmed website. In this proposed work, we implemented and evaluated the performance of existing multiple classifier ensemble algorithms such as Bagging and Boosting. In this analysis, we have found that the Decision Tree algorithm performs well than other models.
- Published
- 2021
13. After the Revolution – New Opportunities for Service Research in a Digital World
- Author
-
Gianfranco Walsh and Werner H. Kunz
- Subjects
Topic model ,Service (business) ,business.industry ,Emerging technologies ,media_common.quotation_subject ,Customer relationship management ,Data science ,Latent Dirichlet allocation ,Digital media ,symbols.namesake ,Originality ,symbols ,Social media ,Sociology ,business ,media_common - Abstract
Purpose – Digital media has revolutionized societies and changed forever how we do business. This paper aims to determine the current scope of service research in the area of digital media, identifying research gaps, and introducing new research contributions to complement our knowledge of digital media. Approach – Based on all service articles of the SERVSIG literature alert system from 2016 to 2019 a subset of digital media articles was identified and Latent Dirichlet Allocation (LDA) text-mining methods were used on the abstracts and titles of the articles for topic modeling of the field. Dominant research topics were identified and depicted in a two-dimensional space. Findings – The study identifies eight distinct topic areas of digital media in service research and shows their relationship to each other in a two-dimensional space. A clear tendency in service research has emerged that is primarily taking a customer versus business perspective for digital media. Further, for some journals, they showed a trend that was detected toward a specialization with respect to particular topics. Implications – This article advocates for more digital media research with a stronger business perspective. Further, although particular new technologies are exciting to discuss, it seems that the importance of customer relationship topics in digital media is not reflected in the current digital media research as needed. Originality/value – The article uses a quantitative-explorative approach to determine the current state of research in regard to digital media in services. We introduce 11 new studies that aims to close the knowledge gap in critical areas of digital media.
- Published
- 2021
14. Informative Effects of Expert Sentiment on the Return Predictability of Cryptocurrency
- Author
-
Yang Li and Simon Trimborn
- Subjects
Topic model ,Cryptocurrency ,Investment decisions ,Actuarial science ,Relation (database) ,Value (economics) ,Sentiment analysis ,Predictability ,Psychology ,Test (assessment) - Abstract
Experts’ opinions are widely considered for investment decisions. We collect textual information from cryptocurrency experts, study the dynamics in their discussion topics and their sentiment in relation to market movements. Based on the analysis we test various hypothesis which span if the experts tweets have informative value (accepted), their discussion topics predict the market in the short (mostly accepted) and long run (rejected) as well as if their sentiment carries short (rejected) and long run (rejected) informative values. Lastly, we test if their individual and consensus sentiment (by profession) is indifferent from each other (both accepted ). This study suggests that there is no informative value from expert sentiment, though for discussion topics it depends on the market situation and prediction horizon. Surprisingly there is no relation to the profession of the expert.
- Published
- 2021
15. The Effects of Verbal and Visual Marketing Content in Social Media Settings: A Deep Learning Approach
- Author
-
Shaohui Wu, Lei Liu, Zhen Fang, and Yingfei Wang
- Subjects
Topic model ,Expectancy theory ,History ,Polymers and Plastics ,Visual marketing ,business.industry ,Deep learning ,Industrial and Manufacturing Engineering ,Consumer engagement ,Semantic relationship ,Social media ,Artificial intelligence ,Business and International Management ,business ,Psychology ,Content (Freudian dream analysis) ,Cognitive psychology - Abstract
Due to the relentless development of social media marketing, firms increasingly rely on a combination of verbal and visual elements to communicate with consumers and attract their attention. The present research investigates how the semantic relationship between text and image information affects consumer engagement (forwards and comments). Leveraging a large-scale dataset of firm-generated messages, we develop a novel end-to-end scalable deep learning model to quantify each text-image message with a well-established, two-dimensional text-image incongruency (relevancy and expectancy). We find that the interaction of relevancy and expectancy, two distinct dimensions of text-image incongruency at the cognitive level, plays a predominant role in affecting consumer engagement on social media. High-relevancy-high-expectancy (HRHE) content and low-relevancy-low-expectancy (LRLE) content are the most effective strategies, whereas high-relevancy-low-expectancy (HRLE) and low-relevancy-high-expectancy (LRHE) contents do not work so well. Furthermore, this paper also shows different antecedents of different types of consumer engagement in social media contexts, including forwards and comments. In particular, HRHE offers an exclusive benefit of boosting forwards while the two strategies are equally effective in eliciting comments. This research contributes to the literature on consumer engagement and social media marketing by addressing the importance of multi-dimensional text-image incongruency and generates important managerial implications.
- Published
- 2021
16. A Novel LDA-based Framework to Forecast COVID-19 Trends
- Author
-
Rahul Katarya and Arushi Gupta
- Subjects
Topic model ,Coronavirus disease 2019 (COVID-19) ,Computer science ,business.industry ,Machine learning ,computer.software_genre ,Latent Dirichlet allocation ,World health ,symbols.namesake ,Data prediction ,symbols ,Artificial intelligence ,business ,computer - Abstract
According to the World Health Organization (WHO), the outbreak of Coronavirus disease or COVID-19 discovered in Wuhan, China, has spread globally and causes a pandemic. In the current scenario, many countries have a high rate of new cases and deaths regularly. This leads to the circulation of a massive amount of online data, news articles, tweets, blogs, and posts. It can be burdensome to ingest an enormous amount of online data. This paper proposes a Latent Dirichlet Allocation (LDA) based framework to explore the features by incorporating news articles and time-series data. The obtained features can be input to any Machine learning (ML) algorithm to improve COVID-19 time-series data prediction. The proposed approach can be used as a topic modeling tool.
- Published
- 2021
17. Computational Linguistic Analysis of Submitted SEC Information (CLASSI)
- Author
-
Andrea Belz and Fnu Shweta
- Subjects
Topic model ,Measure (data warehouse) ,symbols.namesake ,Text processing ,Process (engineering) ,Computer science ,Scale (chemistry) ,symbols ,sort ,Latent Dirichlet allocation ,Data science ,Word (computer architecture) - Abstract
Text analysis is an important element of research and practice in finance, management, and operations. Word associations offer deep insight into dynamics, strategy, and tactics of industries at scale, and thus automated text processing is of great interest. We report development of a new platform using a Latent Dirichlet Allocation (LDA) topic modeling process to analyze reports that publicly traded companies submit to the Securities and Exchange Commission (SEC). We describe evaluations of the system's intrinsic performance and an important external measure, the ability to sort documents into Standard Industrial Classifications (SICs), a widely used measure of industry categories. We discuss potential applications in operations, finance, and management.
- Published
- 2021
18. Mapping Twenty Years of Antimicrobial Resistance Research Trends
- Author
-
Bhanu Sinha, Annemarie Braakman-Jansen, Lisette van Gemert-Pijnen, Alfred Stein, Johann Magnus van Niekerk, Julia Keizer, Christian F. Luz, Corinna Glasner, and Nienke Beerlage-de Jong
- Subjects
Text corpus ,Topic model ,business.industry ,Political science ,Health care ,Global health ,Declaration ,European commission ,Stewardship ,User interface ,business ,Data science - Abstract
Background: Antimicrobial resistance (AMR) is a global threat to health and healthcare. In response to the growing AMR burden, research funding also increased. However, a comprehensive overview of the research output, including conceptual, temporal, and geographical trends, is missing. Therefore, this study uses topic modelling, a machine learning approach, to reveal the scientific evolution of AMR research and its trends, and provides an interactive user interface for further analyses. Methods: Structural topic modelling (STM) was applied on a text corpus resulting from a PubMed query comprising AMR articles (1999-2018). A topic network was established and topic trends were analysed by frequency, proportion, and importance over time and space. Findings: In total, 88 topics were identified in 158616 articles from 166 countries. AMR publications increased by 450% between 1999 and 2018, emphasizing the vibrancy of the field. Prominent topics in 2018 were Strategies for emerging resistances and diseases, Nanoparticles, and Stewardship. Emerging topics included Water and environment, and Sequencing. Geographical trends showed prominence of Multidrug-resistant tuberculosis (MDR-TB) in the WHO African Region, corresponding with the MDR-TB burden. China and India were growing contributors in recent years, following the United States of America as overall lead contributor. Interpretation: This study provides a comprehensive overview of the AMR research output thereby revealing the AMR research response to the increased AMR burden. Both the results and the publicly available interactive database serve as a base to inform and optimise future research. Funding: INTERREG-VA EurHealth-1Health (202085); European Commission Horizon 2020 Framework Declaration of Interests: We declare no conflict of interests.
- Published
- 2021
19. An Empirical Study of Bioinformatics Topics in Online Forum Discussions
- Author
-
Gias Uddin, Sheikh Hasib Ahmed, M. Saifur Rahman, and Dibyendu Brinto Bose
- Subjects
Topic model ,History ,Pattern detection ,Documentation ,Empirical research ,Biodata ,Polymers and Plastics ,Stack overflow ,Online forum ,Business and International Management ,Bioinformatics ,Popularity ,Industrial and Manufacturing Engineering - Abstract
In this paper, we aim to understand the topics discussed by bioinformatics practitioners in Stack Exchange sites. We downloaded all bioinformatics posts (questions and accepted answers) from four Stack Exchange Q&A sites (Stack Overflow, Biology, Cross-Validated, and Bioinformatics). Then we applied topic modeling on each site data. We labeled the topics and grouped those into high level categories. We analyzed the topics further by determining their popularity, difficulty, and evolution. We have made a comparative analysis of the topics across the different studied sites. We found 14 topics in Stack Overflow that are grouped into six categories. The number of new bioinformatics questions is steadily increasing over time in Stack Overflow for each topic category. Topics related to sequence analysis and pattern detection are the most popular as well as the most difficult to get an accepted answer. Most of the discussion posts are ‘how’ type questions, i.e., the practitioners were looking for solutions. While topics like biodata processing are found in multiple Stack Exchange sites, other topics (e.g., gene evolution analysis) are found in specialized sites (e.g., Biology). These findings show the need for consulting multiple related sites in Stack Exchange to learn interdisciplinary fields like bioinformatics. The tradeoff between popularity and difficulty of the bioinformatics topics highlights that bioinformatics practitioners need documentation and better tool support. The bioinformatics researchers, organizations, and practitioners can look into our results to prioritize the specific areas that need more focus for improvement.
- Published
- 2021
20. Measuring a Contract's Breadth: A Text Analysis
- Author
-
Bryan C. McCannon, Yang Zhou, and Joshua C. Hall
- Subjects
Topic model ,Measure (data warehouse) ,symbols.namesake ,Actuarial science ,Computer science ,Proof of concept ,symbols ,Expansive ,Latent Dirichlet allocation ,Test (assessment) ,Diversity (business) - Abstract
We use a computational linguistic algorithm to measure the topics covered in the text of school teacher contracts in Ohio. We use the topic modeling metrics in a calculation of the concentration of topics covered. This allows us to assess how expansive each contract is. As a proof of concept, we evaluate the relationship between our topic diversity measurement and the prevalence of support staff. This test is done on a subsample of the contracts in the state. If more specialized services are provided, then contracts must presumably be broader as they cover more employment relationships. We confirm a strong, statistically significant relationship between our measurement and the prevalence of these support staff. Thus, we have a valid measurement of contract breadth.
- Published
- 2021
21. How Much Can Machines Learn Finance From Chinese Text Data?
- Author
-
Lirong Xue, Yang Zhou, and Jianqing Fan
- Subjects
Finance ,Topic model ,Computer science ,business.industry ,Investment strategy ,Event study ,Equity (finance) ,Predictive power ,Stock market ,business ,Stock (geology) ,Focus (linguistics) - Abstract
Most studies on equity markets using text data focus on English-based specified sentiment dictionaries or topic modeling. However, can we predict the impact of news directly from the text data? How much can we learn from such a direct approach? We present here a new framework for learning text data based on the factor model and sparsity regularization, called FarmPredict, to let machines learn financial returns automatically. Unlike other dictionary-based or topic models that have stringent pre-screening processes, our framework allows the model to extract information more fully from the whole article. We demonstrate our study on the Chinese stock market, as Chinese text has no natural spaces between words and phrases and the Chinese market has a very large proportion of retail investors. These two specific features of our study differ significantly from the previous literature that focuses on English-text and the U.S. market. We validate our method using the literature on the Chinese stock market with several existing approaches. We show that positive sentiments scored by our FarmPredict approach generate on average 83 bps stock daily excess returns, while negative news has an adverse impact of 26 bps on the days of news announcements, where both effects can last for a few days. This asymmetric effect aligns well with the short-sale constraints in the Chinese equity market. As a result, we show that the machine-learned sentiments do provide sizeable predictive power with an annualized return of 116% with a simple investment strategy and the portfolios based on our model significantly outperform other models. This lends further support that our FarmPredict can learn the sentiments embedded in financial news. Our study also demonstrates the far-reaching potential of using machines to learn text data.
- Published
- 2021
22. Narrative Asset Pricing: Interpretable Systematic Risk Factors from News Text
- Author
-
Yinan Su, Bryan T. Kelly, and Leland Bybee
- Subjects
Topic model ,History ,Polymers and Plastics ,Computer science ,Univariate ,Feature selection ,Industrial and Manufacturing Engineering ,Stochastic discount factor ,Systematic risk ,Econometrics ,Capital asset pricing model ,Business and International Management ,Set (psychology) ,Interpretability - Abstract
We seek fundamental risks from news text. Conceptually, news is closely related to the idea of systematic risk, in particular the "state variables" in the ICAPM. News captures investors' concerns about future investment opportunities, and hence drives the current pricing kernel. This paper demonstrates a way to extract a parsimonious set of risk factors and eventually a univariate pricing kernel from news text. The state variables are reduced and selected from the variations in attention allocated to different news narratives. As a result, the risk factors attain clear text-based interpretability as well as top-of-the-line asset pricing performance. The empirical method integrates topic modeling (LDA), latent factor analysis (IPCA), and variable selection (group lasso).
- Published
- 2021
23. COVID-19 Health Response from January to April 2020 in the Philippines: A Topic Modeling Analysis using Latent Dirichlet Allocation Algorithm
- Author
-
Ginbert Cuaton, Las Johansen Caluza, and Joshua Francisco Neo
- Subjects
Topic model ,2019-20 coronavirus outbreak ,Actuarial science ,Coronavirus disease 2019 (COVID-19) ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,media_common.quotation_subject ,Latent Dirichlet allocation ,symbols.namesake ,Geography ,Pandemic ,symbols ,Thematic analysis ,Welfare ,media_common - Abstract
The COVID-19 disease greatly challenged and resulted to serious and irreversible impacts on the social, economic, political, and health welfare of billions of p
- Published
- 2020
24. A Narrowing of AI Research?
- Author
-
Konstantinos Stathoulopoulos, Joel Klinger, and Juan Mateos-Garcia
- Subjects
FOS: Computer and information sciences ,Topic model ,Technological change ,General purpose technology ,Private sector ,GeneralLiterature_MISCELLANEOUS ,Computer Science - Computers and Society ,ComputingMethodologies_PATTERNRECOGNITION ,Incentive ,Dominance (economics) ,Scale (social sciences) ,Political science ,Computers and Society (cs.CY) ,Marketing ,Diversity (business) - Abstract
The arrival of deep learning techniques able to infer patterns from large datasets has dramatically improved the performance of Artificial Intelligence (AI) systems. Deep learning's rapid development and adoption, in great part led by large technology companies, has however created concerns about a premature narrowing in the technological trajectory of AI research despite its weaknesses, which include lack of robustness, high environmental costs, and potentially unfair outcomes. We seek to improve the evidence base with a semantic analysis of AI research in arXiv, a popular pre-prints database. We study the evolution of the thematic diversity of AI research, compare the thematic diversity of AI research in academia and the private sector and measure the influence of private companies in AI research through the citations they receive and their collaborations with other institutions. Our results suggest that diversity in AI research has stagnated in recent years, and that AI research involving the private sector tends to be less diverse and more influential than research in academia. We also find that private sector AI researchers tend to specialise in data-hungry and computationally intensive deep learning methods at the expense of research involving other AI methods, research that considers the societal and ethical implications of AI, and applications in sectors like health. Our results provide a rationale for policy action to prevent a premature narrowing of AI research that could constrain its societal benefits, but we note the informational, incentive and scale hurdles standing in the way of such interventions., Comment: Fourth version: Includes substantial changes in response to reviewer comments such as: alternative strategy to identify AI papers, new robustness section, new analysis of private research influence, substantially modified literature review and creation of technical annex
- Published
- 2020
25. Investor Attention and Topic Appearance Probabilities: Evidence from Treasury Bond Market
- Author
-
Cathy Yi-Hsuan Chen, Ying Chen, and Hao Lei
- Subjects
Topic model ,Index (economics) ,Yield (finance) ,Financial news ,Economics ,Predictive power ,Econometrics ,Bond market ,Volatility (finance) ,Treasury - Abstract
Motivated by the category-learning behavior, we propose to use Topic Appearance Probability (TAP) in the financial news as an alternative measure of investor attention. We then investigate the relationship between the investor attention, measured by the widely used the Google Search Volume Index and our proposed TAP, and the short-term 3-month and long-term 10-year Treasury yields using daily and weekly data. Our empirical findings are: (1) there exists a contemporaneous relationship between investor attention and the return of the Treasury yields for daily data, but not weekly data; (2) The investor attention has a more pronounced predictive power on the return of the 3-month Treasury yield than that of 10-year, which is in terms of adjusted R2 and the number of significant terms. (3) Investor attention has certain predictive power over the volatility.
- Published
- 2020
26. A Machine-Learning History of English Caselaw and Legal Ideas Prior to the Industrial Revolution I: Generating and Interpreting the Estimates
- Author
-
Peter Murrell and Peter Grajzl
- Subjects
Topic model ,History ,Existential quantification ,Qualitative evidence ,05 social sciences ,Timeline ,History of England ,0506 political science ,Epistemology ,History of English ,Embodied cognition ,0502 economics and business ,050602 political science & public administration ,050207 economics ,Industrial Revolution ,General Economics, Econometrics and Finance - Abstract
The history of England's institutions has long informed research on comparative economic development. Yet to date, there exists no quantitative evidence on a core aspect of England's institutional evolution, that embodied in the accumulated decisions of English courts. Focusing on the two centuries before the Industrial Revolution, we generate and analyze the first quantitative estimates of the development of English caselaw and its associated legal ideas. We achieve this in two companion papers. In this, the first of the pair, we build a comprehensive corpus of 52,949 reports of cases heard in England's high courts before 1765. Estimating a 100-topic structural topic model, we name and interpret all topics, each of which reflects a distinctive aspect of English legal thought. We produce time series of the estimated topic prevalences. To interpret the topic timelines, we develop a tractable model of the evolution of legal-cultural ideas and their prominence in case reports. In the companion paper, we will illustrate with multiple applications the usefulness of the large amount of new information generated by our approach.
- Published
- 2020
27. Document Representations to Improve Topic Modelling
- Author
-
P. Venkata Poojitha and Remya R. K. Menon
- Subjects
Topic model ,Information retrieval ,business.industry ,Computer science ,Approximation algorithm ,Sparse approximation ,Matching pursuit ,Latent Dirichlet allocation ,symbols.namesake ,symbols ,Web application ,Cluster analysis ,business ,tf–idf ,Sparse matrix - Abstract
Each and every day, huge amount of information are collected from web applications. So it is difficult to understand or detect what the whole information is all about. To detect, understand and summarise the whole information, it requires some specific tools and techniques like topic modelling, which helps to analyze and identify the crisp of the data. This paper implements the sparsity based document representation to improve Topic Modeling, it organizes the data with meaningful structure by using machine learning algorithms like LDA (Latent Dirichlet Allocation) and OMP (Orthogonal Matching Pursuit) algorithms. It identifies a documents belongs to which topic as well as similarity between documents in an existing dictionary. The OMP(Orthogonal Matching Pursuit) algorithm is the best algorithm for sparse approximation With better accuracy. OMP(Orthogonal Matching Pursuit) algorithm can identify the topics to which the input document[Y] is mostly related to across a large collection of text documents present in a dictionary.
- Published
- 2020
28. Mergers Under the Microscope: Analysing Conference Call Transcripts
- Author
-
Fangyuan Ma, Daisy Wang, Sudipto Dasgupta, Haojun Xie, and Jarrad Harford
- Subjects
Topic model ,Actuarial science ,Shareholder ,Order (exchange) ,Voting ,media_common.quotation_subject ,Corporate governance ,Conference call ,Business ,Endogeneity ,Payment ,media_common - Abstract
About half of all merger deals between public US acquirers and targets involve a conference call within two days of the deal announcement, in order to communicate information to both acquirer and target shareholders to garner voting support and mitigate legal liability. Calls are associated with positive market reactions and a higher likelihood of deal completion. However, for public targets, only the latter result holds after correcting for endogeneity. Using a probabilistic topic modelling approach, we identify 20 highly interpretable topics as prevalent in the presentations and discussions recorded in the transcripts. The relative importance of several of these in a deal transcript is associated with target characteristics (e.g., whether the target is a private or a public firm), the method of payment, and acquirer characteristics (e.g., governance). The importance of several topics is associated with significant abnormal returns on deal announcement, and with deal completion likelihood.
- Published
- 2020
29. Talk about the Coronavirus Pandemic: Initial Evidence from Corporate Disclosures
- Author
-
Betty Xing and Victor X. Wang
- Subjects
Topic model ,Presentation ,Earnings ,business.industry ,media_common.quotation_subject ,Pandemic ,Sentiment analysis ,Timeline ,Accounting ,Business ,Quarter (United States coin) ,Session (web analytics) ,media_common - Abstract
The novel Coronavirus disease (COVID-19) has become the world’s center of attention in 2020. In this paper, we examine firm disclosures of COVID-19 during the first quarter of 2020, a time when firms face tremendous uncertainty and have little guidance on what and how to disclose. We compare Coronavirus-related disclosures in SEC filings and earnings conference calls with the timeline of the spread of the disease and with information gathering and disseminating activities in Google searches and news articles. We find that initial corporate disclosures are driven by information demand, and firm managers are proactive in providing information to investors. Our topic modelling analysis shows that although firms recognize the massive impact of the pandemic on their operations, their disclosures in SEC filings are general and lack specifics. Finally, we find that analysts are proactive in raising questions regarding the impact of COVID-19 during the Q&A session of the conference calls, and that firm managers are quick to provide necessary disclosures in the presentation session as the pandemic develops.
- Published
- 2020
30. Is Mandatory Risk Reporting Informative? Evidence from US REITS Using Machine Learning for Text Analysis
- Author
-
Koelbl M, Bertram I. Steininger, Schuierer R, and Schäfers W
- Subjects
Topic model ,History ,Vocabulary ,Actuarial science ,Polymers and Plastics ,media_common.quotation_subject ,Financial market ,Sample (statistics) ,Industrial and Manufacturing Engineering ,Terminology ,Real estate investment trust ,Unsupervised learning ,Business ,Business and International Management ,media_common ,Financial market participants - Abstract
Text in corporate disclosures conveys important information to financial market participants. If incorporated in quantitative models, the intended meaning of a text is often hidden by the use of idiosyncratic terminology within an industry-specific vocabulary. This study uses an unsupervised machine learning algorithm, the Structural Topic Model, to overcome this issue. It illustrates the connection between machine-extracted risk factors discussed in corporate disclosures (10-Ks) and the corresponding pricing behavior of investors for a not yet investigated US REIT sample from 2005 to 2019. When disclosed, most risk factors counterintuitively decrease stock return volatility and are therefore beneficial for the pricing process on financial markets.
- Published
- 2020
31. Measuring the Direction of Innovation: Frontier Tools in Unassisted Machine Learning
- Author
-
Jino Lu, Jeffrey L. Furman, and Florenta Teodoridis
- Subjects
Text corpus ,Hierarchical Dirichlet process ,Topic model ,business.industry ,Computer science ,Technological change ,Diversification (marketing strategy) ,Space (commercial competition) ,Machine learning ,computer.software_genre ,Taxonomy (general) ,Strategic management ,Artificial intelligence ,business ,computer - Abstract
Understanding the factors affecting the direction of innovation is a central aim of research in the economics and strategic management of innovation. Progress on this topic has been inhibited by difficulties in measuring the location and movement of innovation in ideas space. To date, most efforts at measuring the direction of innovation rely on curated taxonomies, such as technology classes and keyword approaches, which either adapt slowly or are subject to gaming, and early generations of text analysis, which provide information on the similarity of sets of words, but not on the number of paths or direction of change. Relative to these, recent advances in machine learning offer promising paths forward. In this paper, we introduce and explore a particular approach based on an unassisted machine learning technique, Hierarchical Dirichlet Process (HDP), that flexibly generates categories from a corpus of text and enables calculations of the distance across knowledge categories and movement in ideas space. We apply our algorithm to the corpus of USPTO patent abstracts from the period 2000-2018 and demonstrate that, relative to the USPTO taxonomy of patent classes, our algorithm provides a leading indicator of shift in innovation topics and enables a more precise analysis of movement in ideas space. Working with such measures is important because it enables more accurate estimates of the direction of innovation and, hence, of economic actors’ responses to competitive environments and public policies. We share our algorithm, which can be applied to other innovation text corpora, as well as the patent data and measures we develop, with the aim of facilitating additional inquiries regarding the direction of innovation.
- Published
- 2020
32. Interpreting Topic Models Using Prototypical Text: From ‘Telling’ To ‘Showing
- Author
-
Phanish Puranam and Arianna Marchetti
- Subjects
Topic model ,restrict ,Computer science ,Interpretation (philosophy) ,media_common.quotation_subject ,Face (sociological concept) ,Organizational culture ,Simplicity ,Data science ,Qualitative research ,media_common ,Interpretability - Abstract
While topic models are increasingly used for natural language processing in management and strategy research, their interpretability remains challenging. If researchers restrict the number of topics for simplicity, the induced topic structure may not be statistically accurate. If they do not, they face difficulties in interpreting topics in a transparent and reproducible way. We propose the “prototypicaltext based interpretation” (PTBI) of topic models, a methodology that gives a rule-based approach for selecting text from the corpus to interpret topic structures. PTBI enables transparent and replicable topic interpretation, a move from “telling” to “showing” pivotal for qualitative research. We illustrate PTBI by studying the organizational culture of Netflix, based on text reviews employees post on Glassdoor.com. We compare our findings to the company’s own public account of its culture and show how PTBI improves the state of the art for topic models interpretation by documenting how our approach differs from and improves on prior practice.
- Published
- 2020
33. Bayesian Estimation for Identifiable Topic Models with Latent Dirichlet Allocation
- Author
-
Toshikuni Sato
- Subjects
Topic model ,Bayes estimator ,business.industry ,Computer science ,Estimation theory ,Stability (learning theory) ,Markov chain Monte Carlo ,Statistical model ,Machine learning ,computer.software_genre ,Latent Dirichlet allocation ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,symbols ,Identifiability ,Artificial intelligence ,business ,computer - Abstract
This paper discusses two effective Bayesian estimation algorithms for identifiable topic models with latent Dirichlet allocation (LDA). Identifying a statistical model and corresponding parameters is important for obtaining a reasonable interpretation from the model in the social sciences. Due to the instability of topic models, several studies have addressed this problem. A reliable solution for the identifiable topic model is to specify anchor words. We employ this anchor word condition and present a simple application in Markov chain Monte Carlo (MCMC) algorithms for identifiable LDA topic models. Simulation studies are performed to investigate the identifiability of standard and correlated topic models, and the results demonstrate that the proposed methods improve the stability of parameter estimation in topic models with LDA.
- Published
- 2020
34. Multi Aspects Topic Model for Twitter Healthcare Recommendation
- Author
-
Moulana Mohammed and R. M. Noorullah
- Subjects
Topic model ,Social network ,business.industry ,Computer science ,Disease cluster ,Data science ,Latent Dirichlet allocation ,Health data ,symbols.namesake ,Architecture tradeoff analysis method ,Health care ,symbols ,business ,Cluster analysis - Abstract
Social networks are an excellent source for users to share or exchange information on topics. Twitter is the most prioritized social network concerning the health-related topics or aspects of social users. Latent Dirichlet Allocation (LDA) is widely used technique to cluster the tweets documents using a derivation of hidden or conceptual topics. Ailment Topic Aspect Model (ATAM) is an extended LDA model that implemented, especially for the clustering of health tweets based on ailments types. It is a restricted model that determines three aspects of ailments, which include disease, symptoms, and treatments. In healthcare applications, it emerged to recommend heath solutions based on side effects analysis. Our proposed work is focused on the development of the Multi Aspects Topic Model (MATM) for effective healthcare recommendations, which consider side-effects aspect also in cluster results. Proposed work demonstrated in the experimental study using benchmarked health data for recommending healthcare solutions with multi-aspects.
- Published
- 2020
35. Social Media Data Reveal Patterns of Policy Engagement in State Legislatures
- Author
-
Julia Payson, Jonathan Nagler, Joshua A. Tucker, Richard Bonneau, and Andreu Casas
- Subjects
Topic model ,History ,Polymers and Plastics ,business.industry ,media_common.quotation_subject ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Legislature ,Public relations ,Industrial and Manufacturing Engineering ,Focus (linguistics) ,State (polity) ,Political science ,State politics ,Social media ,Business and International Management ,business ,Legislator ,Data limitations ,media_common - Abstract
State governments are the focus of important policy decisions in the United States. How do state legislators use their public communications ---particularly social media---to engage with policy debates? Due to previous data limitations, we lack systematic information about whether and how state legislators publicly discuss policy and how this behavior varies across contexts. Using Twitter data and state of the art topic modeling techniques, we introduce a method to study state legislator policy priorities and apply the method to fifteen U.S. states in 2018. We show that we are able to capture the policy issues discussed by state legislators with substantially more accuracy than existing methods. We then present initial findings that validate the method and speak to debates in the literature. For example, state legislators in competitive districts are more likely to discuss policy than those in less competitive districts, and legislators from more professional legislatures discuss policy at similar rates to those in less professional legislatures. We conclude by discussing promising avenues for future state politics research using this new approach.
- Published
- 2020
36. Charting the Path to Purchase Using Topic Models
- Author
-
Hongshuang Li and Liye Ma
- Subjects
Marketing ,Topic model ,Economics and Econometrics ,History ,Information retrieval ,Polymers and Plastics ,Computer science ,Heuristic ,05 social sciences ,Inference ,Advertising ,Latent Dirichlet allocation ,Industrial and Manufacturing Engineering ,symbols.namesake ,Online search ,0502 economics and business ,Path (graph theory) ,symbols ,Position (finance) ,Revenue ,050211 marketing ,050207 economics ,Business and International Management ,Hidden Markov model - Abstract
In gathering information for an intended purchase decision, consumers submit search phrases to online search engines. These search phrases directly express the consumers’ needs in their own words and thus provide valuable information to marketing managers. Interpreting consumers’ search phrases renders a better understanding of their purchase intentions, which is critical for marketing success. In this article, the authors develop an integrated model to connect the latent topics embedded in consumers’ search phrases to their website visits and purchase decisions. Using a unique data set containing more than 8,000 search phrases submitted by consumers, the model identifies latent topics underlying the searches that led consumers to the firm’s website. Compared with a model lacking any textual information from consumers’ search phrases, a model using textual data in a heuristic approach, and a model based on the latent Dirichlet allocation, the proposed model provides a better evaluation of a consumer’s position on the path to purchase and achieves much better predictive accuracy, which could in turn substantially increase the firm’s revenue. The authors also extend the discussion to aggregators, affiliated websites, and segments of consumers who are exposed to the firm’s outbound ads. Marketing managers can use this method to extract structured information from consumers’ search phrases to facilitate their inference of consumers’ latent purchase states and thereby improve marketing efficiency.
- Published
- 2020
37. Topmodpy: A Simple Python Script for Topic Modeling
- Author
-
Kamakshaiah Musunuru
- Subjects
Topic model ,business.industry ,Latent semantic analysis ,Computer science ,Unstructured data ,Python (programming language) ,computer.software_genre ,Latent Dirichlet allocation ,symbols.namesake ,Action (philosophy) ,Subjective data ,Simple (abstract algebra) ,symbols ,Artificial intelligence ,business ,computer ,Natural language processing ,computer.programming_language - Abstract
Subjective assessment is rampant in literature verification and title evaluation. While subjective assessment is a valid practice but creates a void in terms of validity of results. Intuition is unique and tend to depend on several other aspects which are not methodological. As a result of which, there may be a possibility of unreasonable yet unfair amount of personal opinion in action. Natural Language Processing (NLP) offers robust mechanisms or techniques to evaluate unstructured data. Latent Dirichlet Allocation (LDA) is one of such techniques which adds logic while processing unstructured but subjective data. This article explains suitability of topmodpy to perform Latent Semantic Analysis (LSA) using Latent Dirichlet Allocation (LDA). topmopy is a Python script and is a collection of 12 different functions each with a unique aim. This article shows as how to use topmopy module on certain data collected using a valid search criteria. topmodpy module found to have obtained these latent constructs related to search criteria. Hence the efficacy of the topmodpy module has been proved.
- Published
- 2020
38. When Do Experts Listen to Other Experts? The Role of Negative Information in Expert Evaluations For Novel Projects
- Author
-
Hardeep Ranu, Gary S. Gray, Misha Teplitskiy, Jacqueline N. Lane, Eva C. Guinan, Karim R. Lakhani, and Michael Menietti
- Subjects
Program evaluation ,Topic model ,Negative information ,Information sharing ,Negativity bias ,Applied psychology ,Valence (psychology) ,Psychology ,Grant funding - Abstract
The evaluation of novel projects lies at the heart of scientific and technological innovation, and yet literature suggests that this process is subject to inconsistency and potential biases. This paper investigates the role of information sharing among experts as the driver of evaluation decisions. We designed and executed two field experiments in two separate grant funding opportunities at a leading research university to explore evaluators’ receptivity to assessments from other evaluators. Collectively, our experiments mobilized 369 evaluators from seven universities to evaluate 97 projects resulting in 761 proposal-evaluation pairs and over $300,000 in awards. We exogenously varied the relative valence (positive and negative) of others’ scores, to determine how exposures to higher and lower scores affect the focal evaluator’s propensity to change the initial score. We found causal evidence of negativity bias, where evaluators are more likely to lower their scores after seeing critical scores than raise them after seeing better scores. Qualitative coding and topic modelling of the evaluators’ justifications for score changes reveal that exposures to lower scores prompted greater attention to uncovering weaknesses, whereas exposures to neutral or higher scores were associated with strengths, along with greater emphasis on non-evaluation criteria, such as confidence in one’s judgment. Overall, information sharing among expert evaluators can lead to more conservative allocation decisions that favor protecting against failure than maximizing success.
- Published
- 2020
39. To Kill a Black Swan: The Credibility Revolution at CEDE, 2000-2018
- Author
-
Juan Pablo Castilla
- Subjects
Comprehension ,Topic model ,Order (exchange) ,Phenomenon ,Causal inference ,Credibility ,Sociology ,Black swan theory ,Research center ,Epistemology - Abstract
The growing displacement of theory and other forms of wide-ranging knowledge of social phenomena by empirical research methods in economics is widely noted by economists and historians of economic knowledge. Less attention has been devoted, however, to understand the materialization of such changes in the scientific practices in a specific research center. This article studies the recent transformations in the epistemological practices at CEDE, a research center that is both, highly influential in the production of economic knowledge in Colombia, and does not belong to the core economics research centers that lead the debates regarding the recent changes in the discipline. I use a machine learning technique called Topic Modeling, interviews to CEDE researchers, and exegesis of papers to identify a shift in the production of knowledge in microeconometrics at CEDE during the years 2000 and 2018. I explain this shift by characterizing two sets of epistemological practices. The first one, usually present during the years 2000 and 2006, establishes a complementary relationship between wide-ranging knowledge (theory included) and data estimation techniques in order to achieve a broad comprehension of the phenomenon under study. The second one, usually present during the years 2009 and 2018, prioritizes data estimation techniques over theoretical models and contextual knowledge in order to achieve a causal comprehension of one variable in the phenomenon under study. Because epistemological practices make truth claims, each one establishes its own criteria about what constitutes a valid research through a distinct way of replicating a scientific experiment. The shift I identify implies a recent tendency to disdain research works that cannot make a strong causal inference.
- Published
- 2020
40. Motivations and Experiences in Kumbh Mela Pilgrimage - Insights from Twitter Analytic
- Author
-
Ram Mohan Dhara, Sandip Mukhopadhyay, and Vinodhini Ranganathan
- Subjects
Topic model ,Wisdom of the crowd ,Sentiment analysis ,Collective intelligence ,Social media ,Sociology ,Pilgrimage ,Data science ,Experiential learning ,Social media analytics - Abstract
The paper uses an innovative approach using data from social media platform, Twitter to understand the pilgrim's motivation, expectation, and experiences in Kumbh Mela, 2019 in Prayagraj. The ‘Kumbh’ Mela in Prayagraj can be considered the largest gathering of mankind in the world, happens once in every six years. The analysis was based on 49513 tweets collected during the Kumbh Mela, 2019 and three distinct techniques (descriptive analytics, content analytics, and network analytics) were used to mine the ‘collective intelligence' or ‘wisdom of the crowd' instead of depending on the data generated from few experts. Multiple content analytic techniques like word cloud, hashtag analysis, topic modeling and sentiment analysis were performed to achieve the research objective. The research also identifies different experiential components as shared in Twitter, from spiritual achievement to unity based on shared understanding as well as their sentiments towards the event. The study provides important insight for researchers as well as guidelines for policymakers. We conclude the paper by providing the limitations as well as future research opportunities derived from the study.
- Published
- 2020
41. How People Reflect On The Usage Of Cosmetic Virtual Goods: A Structural Topic Modeling Analysis Of R/Dota2 Discussions
- Author
-
Denis Bulygin and Ilya Musabirov
- Subjects
Topic model ,Virtual goods ,Work (electrical) ,Association (object-oriented programming) ,Field (Bourdieu) ,Revenue ,Business ,Marketing ,Game Developer - Abstract
With virtual purchases being a leading source of revenue for game developers, it is still unclear how players evaluate non-functional goods and the experiences those goods grant. With the use of structural topic models, this work demonstrates the dimensions of players’ experience extracted from discussions in association with price changes. This work contributes to the field by decomposing virtual goods values into experience dimensions in their relationship between extracted experience dimensions and the item’s price, and by a detailed description of expectation mismatch.
- Published
- 2020
42. Taking Cultural Heterogeneity Seriously: The Distinct Forms of Cultural Distinctiveness in Organizations
- Author
-
Arianna Marchetti
- Subjects
Topic model ,Resource-based view ,Organizational culture ,Optimal distinctiveness theory ,Cognition ,Sociology ,Competitor analysis ,Content (Freudian dream analysis) ,Competitive advantage ,Cognitive psychology - Abstract
Although theories exist that link a distinctive organizational culture to superior performance, our understanding of the different ways in which cultural distinctiveness arises remains limited. I argue that because a culture results from aggregating individuals’ cognition and behavior in the organization, it can be distinctive to a firm either in central tendency (content), distributional properties (configuration), or both, on either idiosyncratic or shared dimensions across firms. By applying topic modeling to 42,982 Glassdoor text reviews posted by corporate employees, I illustrate these forms of cultural distinctiveness by comparing Netflix culture, viewed as distinctive in popular discourse, to that of its peers. I find that both Netflix cultural content and configuration are distinctive on idiosyncratic dimensions, compared to its competitors, and discuss implications for competitive advantage.
- Published
- 2020
43. Structure and Content in the United States Code
- Author
-
Keith Carlson, Michael A. Livermore, Faraz Dadgostari, and Daniel N. Rockmore
- Subjects
Topic model ,Structure (mathematical logic) ,Hierarchy ,Descriptive statistics ,Computer science ,Statutory law ,Assortativity ,Hierarchical organization ,Mutual information ,Data science - Abstract
This paper explores the relationship between statutory structure—as realized through hierarchical organization and a cross-reference citation network—and semantic content. We report several novel descriptive statistics concerning the United States Code (USC), including the results of the first application to this corpus of the machine learning technique of topic modeling. We find that the topic model performs quite well in discovering relevant legal categories, as expressed in the subject-matter organization of the USC. We estimate relationships between formal structure and topic content and find that the assortativity of “titles” (the highest level hierarchy within the USC) and the assortativity of related topics are highly similar. We also examine the degree of mutual information between statutory structure and content and find that alternative machine learning techniques are able to recover information about structure from content, indicating some level of mutual information. Our analysis can be used to develop superior measures of legal complexity with the potential to improve studies that seek to understand the importance of legal complexity for social outcomes (such as compliance costs or economic growth). Our work also has potential applications for the study of law search and in developing tools to facilitate public access to the law.
- Published
- 2020
44. Taking Stock of Customization Research: A Computational Review and Interdisciplinary Research Agenda
- Author
-
David Antons, Frank T. Piller, Robin Kleer, and Stephan Hankammer
- Subjects
Topic model ,symbols.namesake ,Goods and services ,Computer science ,Mass customization ,Customer needs ,symbols ,Information system ,Data science ,Latent Dirichlet allocation ,Stock (geology) ,Personalization - Abstract
A considerable body of literature has studied the idea of producing and marketing customized goods and services according to individual customer needs, addressing the growing importance of strategies like mass customization or personalization in business practice. However, customization research is characterized by scattered discussions along several unconnected streams of research in marketing, operations management, and information systems. With this study, we seek to outline the developments and hidden structures within 1,273 scholarly articles on customization to develop recommendations for future research. Applying recent advances in text mining, namely the Latent Dirichlet Allocation algorithm for topic modeling, allowed us to extract 60 topics and their corresponding terms and to map the temporal development trajectory of each topic. Furthermore, we evaluated the roles of research disciplines in shaping these topics. Based on this understanding, we empirically derive recommendations that will help scholars to address the interdisciplinary potentials and requirements of customization research.
- Published
- 2020
45. What are the Topics Discussed by the Singaporean Public on About COVID-19? An Exploratory Analysis of Telegram Chats
- Author
-
Xingyu Ken Chen and Loo Seng Neo
- Subjects
Topic model ,2019-20 coronavirus outbreak ,Coronavirus disease 2019 (COVID-19) ,business.industry ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,Sociology ,Exploratory analysis ,Public relations ,business - Abstract
This study examines messages on Telegram groups to understand what the public is concerned with during the COVID-19 crisis Using Anchored Correlation Explanati
- Published
- 2020
46. Understanding Large-Scale Dynamic Purchase Behavior
- Author
-
Dennis Fok, Bruno Jacobs, and Bas Donkers
- Subjects
Product (business) ,Topic model ,Interdependence ,Customer base ,Computer science ,Multiple time dimensions ,media_common.quotation_subject ,Scalability ,Inference ,Context (language use) ,Data science ,media_common - Abstract
In modern retail contexts, retailers sell products from vast product assortments to a large and heterogeneous customer base. Understanding purchase behavior in such a context is very important. Standard models cannot be used due to the high dimensionality of the data. We propose a new model that creates an efficient dimension reduction through the idea of purchase motivations. We only require customer-level purchase history data, which is ubiquitous in modern retailing. The model handles large-scale data and even works in settings with shopping trips consisting of few purchases. As scalability of the model is essential for practical applicability, we develop a fast, custom-made inference algorithm based on variational inference. Essential features of our model are that it accounts for the product, customer and time dimensions present in purchase history data; relates the relevance of motivations to customer- and shopping-trip characteristics; captures interdependencies between motivations; and achieves superior predictive performance. Estimation results from this comprehensive model provide deep insights into purchase behavior. Such insights can be used by managers to create more intuitive, better informed, and more effective marketing actions. We illustrate the model using purchase history data from a Fortune 500 retailer involving more than 4,000 unique products.
- Published
- 2020
47. sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics
- Author
-
Kunpeng Zhang and Yi Yang
- Subjects
Topic model ,Perplexity ,business.industry ,Computer science ,Deep learning ,Supervised learning ,Big data ,Machine learning ,computer.software_genre ,Latent Dirichlet allocation ,symbols.namesake ,symbols ,Leverage (statistics) ,Relevance (information retrieval) ,Artificial intelligence ,business ,computer - Abstract
Topic modeling methods such as Latent Dirichlet Allocation (LDA) are powerful tools for analyzing massive amounts of textual data. They have been extensively used in information systems and management research to identify latent topics for data exploration and as a feature-engineering mechanism to derive new variables for additional analyses. However, existing topic modeling approaches are mostly unsupervised and only leverage textual data, while ignoring additional useful information often associated with text, such as star ratings in customer reviews or categories of comments in online discussion forums. As a result, the topics extracted and new variables derived based on the learned topic vectors may not be accurate, which could lead to biased or incorrect estimation for subsequent econometric analysis and inferior performance for predictive tasks. In response, we propose a novel supervised topic modeling approach called sDTM that is designed in a Bayesian deep learning manner while incorporating additional useful data. sDTM offers three key advantages over traditional topic modeling approaches. First, it learns high-quality topics as measured quantitatively and qualitatively, which can help alleviate concerns over potential measurement errors in econometric analysis. Second, this supervised learning model achieves significantly superior predictive performance over cutting-edge baselines. Finally, sDTM is able to highlight those words that have stronger impact on the outcome, thereby facilitating transparent model investigation. Experimental results on three datasets show that sDTM not only improves supervised learning tasks, including classification and regression, but also exhibits a better model fit (e.g., lower perplexity) for document understanding. sDTM makes methodological contributions to the IS and management literature and has direct relevance for research using big data analytics.
- Published
- 2020
48. Estimating a Culture: Bacon, Coke, and Seventeenth-Century England
- Author
-
Peter Grajzl and Peter Murrell
- Subjects
Topic model ,Structure (mathematical logic) ,Politics ,Similarity (psychology) ,Context (language use) ,Coke ,Sociology ,Epistemology - Abstract
A characterization of the ideas of Francis Bacon and Edward Coke, two paramount English lawyer-scholars, provides insights into the nature of the legal-intellectual culture of early seventeenth-century England. To develop the insights we employ a methodology not previously used in this context, applying structural topic modeling to a large corpus comprising the works of both Bacon and Coke. Estimated topics span legal, political, scientific, and methodological themes. Legal topics evidence an advanced structure of common-law thought, straddling ostensibly disparate areas of the law. Interconnections between topics reveal a distinctive approach to the pursuit of knowledge, embodying Bacon's epistemology and Coke's legal methodology. A key similarity between Bacon and Coke overshadows their differences: both sought to build reliable knowledge based on generalizing from particulars. The resulting methodological paradigm can be understood as reflecting a legacy of common-law thought and constituting one key contribution of these authors to the era's emerging legal-intellectual culture.
- Published
- 2019
49. Opinion Spam Detection and Analysis by Identifying Domain Features in Product Reviews
- Author
-
K S Kavitha and Lakshmi Holla
- Subjects
Opinion spam ,Feature engineering ,Topic model ,symbols.namesake ,Product reviews ,Computer science ,Specific-information ,Sentiment analysis ,symbols ,Latent Dirichlet allocation ,Data science ,Domain (software engineering) - Abstract
With the rapid advent of technology there is an exponential number of users who purchase online products and express their opinions in form of reviews. It is observed that recommending the deserving products to people often depend on the sentiment expressed in the reviews. Classification of the reviews as genuine and fake is one of the hardest problems in the current world. Feature Engineering is an active area of research in text analytics and opinion mining and plays a significant role in extracting features from reviews. Genuine reviews often contain higher percentage of concrete information or domain specific information as these reviews are written based on experience, but spam reviews often lack this information This paper focuses on, using Latent Dirichlet Allocation topic model to extract domain features in product reviews and use this as one of the main features to identify fake reviews.
- Published
- 2019
50. Moral Contestation and the Regulatory Politics of Rule-Making for Derivatives
- Author
-
J. Nicholas Ziegler, Thomas Nath, and Konrad Posch
- Subjects
Typology ,Investment banking ,Topic model ,Politics ,Market economy ,Financial regulation ,Cleavage (politics) ,business.industry ,Commission ,business ,Futures contract - Abstract
A central question facing market societies is how governments can best regulate economic competition in emerging areas. This paper illuminates this general question by examining the public comments submitted to the Commodity Futures Trading Commission (CFTC), 2010-2014, in response to proposed rules for implementing a particularly technical part of the Dodd-Frank Act: the creation of a new regulatory regime for derivatives trading. The literature on regulatory implementation emphasizes the preponderance of concentrated industry actors compared to other groups. The CFTC’s effort to create a new regulatory regime for the derivatives business allows us to compare the commenting activity of the large investment banks to a broad range of firms and other organizations. We develop a more fine-grained typology of industry segments and non-industry groups than other studies of implementation. By combining this typology with a topic-modeling approach, we can systematically map commonalities and divergences among different commenters. We find that the complex cross-cutting patterns within the industry sector are far less conspicuous than a clear cleavage in the willingness of different types of commenters to address the moral dimensions of market behavior.
- Published
- 2019
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.