Back to Search
Start Over
Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification
- Source :
- European Journal of Investigation in Health, Psychology and Education, Vol 11, Iss 109, Pp 1537-1554 (2021), European Journal of Investigation in Health, Psychology and Education, European Journal of Investigation in Health, Psychology and Education; Volume 11; Issue 4; Pages: 1537-1554
- Publication Year :
- 2021
- Publisher :
- AsociaciĆ³n Universitaria de EducaciĆ³n, 2021.
-
Abstract
- Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet straightforward method by regressing the outcome of interest on the aggregated influence scores for association and/or classification analyses based on generalized linear models. The method reduces the document term matrix by transforming text data into a continuous summary score, thereby reducing the data dimension substantially and easing the data sparsity issue of the term matrix. To illustrate the proposed method in detailed steps, we used three Twitter datasets on various topics: autism spectrum disorder, influenza, and violence against women. We found that our results were generally consistent with the critical factors associated with the specific public health topic in the existing literature. The proposed method could also classify tweets into different topic groups appropriately with consistent performance compared with existing text mining methods for automatic classification based on tweet contents.
- Subjects :
- Generalized linear model
Computer science
text data
Machine learning
computer.software_genre
Article
Developmental and Educational Psychology
odds ratio
Psychology
hurdle model
Applied Psychology
business.industry
regression
social network
document term matrix
relative risk
Popularity
BF1-990
Clinical Psychology
Artificial intelligence
Public aspects of medicine
RA1-1270
business
computer
Subjects
Details
- Language :
- English
- ISSN :
- 21748144 and 22549625
- Volume :
- 11
- Issue :
- 109
- Database :
- OpenAIRE
- Journal :
- European Journal of Investigation in Health, Psychology and Education
- Accession number :
- edsair.doi.dedup.....85fb76f91f5ff0ab84d34622fc222b21