Start Over

Machine learning of language use on Twitter reveals weak and non-specific predictions

Authors :: Robert Whelan
Sean Kelley
Mhaonaigh Cn
Burke L
Claire M. Gillan
Publication Year :: 2021
Publisher :: Center for Open Science, 2021.
Abstract: Background: Depressed individuals use language differently than healthy controls and it has been proposed that social media posts could therefore be used to identify depression. But much of the evidence behind this claim relies on indirect measures of mental health that are sometimes circular, such as statements of self-diagnosis (“Got an OCD diagnosis today”) on social media or membership in disorder-specific online forums. Relatedly, few studies have tested if these language features are specific to depression versus other aspects of mental health. Methods: We analyzed the Tweets of 1,006 participants who completed questionnaires assessing symptoms of depression and 8 other mental health conditions. Daily Tweets were subjected to textual analysis and the resulting linguistic features were used to train an Elastic Net model on depression severity, using nested cross validation. We then tested performance in a held-out test set (30%), comparing predictions of depression versus 8 other aspects of mental health. Results: The depression trained model had only modest predictive performance when tested out of sample, explaining just 2.5% of variance in depression symptoms (R2 = 0.025, r = 0.16). The performance of this model was as-good or superior when used to identify other aspects of mental health: schizotypy (R2 = 0.035, r = 0.19), social anxiety (R2 = 0.025, r = 0.16), eating disorders (R2 = 0.011, r = 0.12), generalized anxiety (R2 = 0.041, r = 0.21), above chance for obsessive-compulsive disorder (R2 = 0.011, r = 0.12), apathy (R2 = 0.008, r = 0.11), but not significant for alcohol abuse (R2 = -0.012, r = 0.04) or impulsivity (R2 = -0.001, r = 0.08).Conclusions: Machine learning analysis of social media data, when trained on well-validated clinical instruments, could not make meaningful individualized out-of-sample predictions regarding mental health status of users. For the small effects observed, language use associated with depression was non-specific, having similar performance in predicting other mental health problems.

Subjects :: World Wide Web
Text mining
Non specific
Computer science
business.industry
The Internet
business

Details

Database :: OpenAIRE
Accession number :: edsair.doi...........81e6fe86974d89efa9c366fd3ad3e981

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Machine learning of language use on Twitter reveals weak and non-specific predictions

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Machine learning of language use on Twitter reveals weak and non-specific predictions

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources