Back to Search Start Over

Discovering Sociolinguistic Associations with Structured Sparsity

Authors :
Eisenstein, Jacob
Smith, Noah A.
Xing, Eric P.
Publication Year :
2018
Publisher :
Figshare, 2018.

Abstract

We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors' geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ℓ1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....e3d879c1db1d8cf791dab1480f818d68
Full Text :
https://doi.org/10.1184/r1/6475556.v1