Back to Search Start Over

Machine learning of language use on Twitter reveals weak and non-specific predictions

Authors :
Robert Whelan
Sean Kelley
Mhaonaigh Cn
Burke L
Claire M. Gillan
Publication Year :
2021
Publisher :
Center for Open Science, 2021.

Abstract

Background: Depressed individuals use language differently than healthy controls and it has been proposed that social media posts could therefore be used to identify depression. But much of the evidence behind this claim relies on indirect measures of mental health that are sometimes circular, such as statements of self-diagnosis (“Got an OCD diagnosis today”) on social media or membership in disorder-specific online forums. Relatedly, few studies have tested if these language features are specific to depression versus other aspects of mental health. Methods: We analyzed the Tweets of 1,006 participants who completed questionnaires assessing symptoms of depression and 8 other mental health conditions. Daily Tweets were subjected to textual analysis and the resulting linguistic features were used to train an Elastic Net model on depression severity, using nested cross validation. We then tested performance in a held-out test set (30%), comparing predictions of depression versus 8 other aspects of mental health. Results: The depression trained model had only modest predictive performance when tested out of sample, explaining just 2.5% of variance in depression symptoms (R2 = 0.025, r = 0.16). The performance of this model was as-good or superior when used to identify other aspects of mental health: schizotypy (R2 = 0.035, r = 0.19), social anxiety (R2 = 0.025, r = 0.16), eating disorders (R2 = 0.011, r = 0.12), generalized anxiety (R2 = 0.041, r = 0.21), above chance for obsessive-compulsive disorder (R2 = 0.011, r = 0.12), apathy (R2 = 0.008, r = 0.11), but not significant for alcohol abuse (R2 = -0.012, r = 0.04) or impulsivity (R2 = -0.001, r = 0.08).Conclusions: Machine learning analysis of social media data, when trained on well-validated clinical instruments, could not make meaningful individualized out-of-sample predictions regarding mental health status of users. For the small effects observed, language use associated with depression was non-specific, having similar performance in predicting other mental health problems.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........81e6fe86974d89efa9c366fd3ad3e981