Lorenzo-Luaces, Lorenzo, Howard, Jacqueline, Edinger, Andy, Yaojun Yan, Harry, Rutter, Lauren A., Valdez, Danny, and Bollen, Johan
Background: Internalizing, externalizing, and somatoform disorders are the most common and disabling forms of psychopathology. Our understanding of these clinical problems is limited by a reliance on self-report along with research using small samples. Social media has emerged as an exciting channel for collecting a large sample of longitudinal data from individuals to study psychopathology. Objective: This study reported the results of 2 large ongoing studies in which we collected data from Twitter and self-reported clinical screening scales, the Studies of Online Cohorts for Internalizing Symptoms and Language (SOCIAL) I and II. Methods: The participants were a sample of Twitter-using adults (SOCIAL I: N=1123) targeted to be nationally representative in terms of age, sex assigned at birth, race, and ethnicity, as well as a sample of college students in the Midwest (SOCIAL II: N=1988), of which 61.78% (1228/1988) were Twitter users. For all participants who were Twitter users, we asked for access to their Twitter handle, which we analyzed using Botometer, which rates the likelihood of an account belonging to a bot. We divided participants into 4 groups: Twitter users who did not give us their handle or gave us invalid handles (invalid), those who denied being Twitter users (no Twitter, only available for SOCIAL II), Twitter users who gave their handles but whose accounts had high bot scores (bot-like), and Twitter users who provided their handles and had low bot scores (valid). We explored whether there were significant differences among these groups in terms of their sociodemographic features, clinical symptoms, and aspects of social media use (ie, platforms used and time). Results: In SOCIAL I, most individuals were classified as valid (580/1123, 51.65%), and a few were deemed bot-like (190/1123, 16.91%). A total of 31.43% (353/1123) gave no handle or gave an invalid handle (eg, entered "N/A"). In SOCIAL II, many individuals were not Twitter users (760/1988, 38.23%). Of the Twitter users in SOCIAL II (1228/1988, 61.78%), most were classified as either invalid (515/1228, 41.94%) or valid (484/1228, 39.41%), with a smaller fraction deemed bot-like (229/1228, 18.65%). Participants reported high rates of mental health diagnoses as well as high levels of symptoms, especially in SOCIAL II. In general, the differences between individuals who provided or did not provide their social media handles were small and not statistically significant. Conclusions: Triangulating passively acquired social media data and self-reported questionnaires offers new possibilities for large-scale assessment and evaluation of vulnerability to mental disorders. The propensity of participants to share social media handles is likely not a source of sample bias in subsequent social media analytics. [ABSTRACT FROM AUTHOR]