Back to Search
Start Over
Discovering Sociolinguistic Associations with Structured Sparsity
- Publication Year :
- 2018
- Publisher :
- Figshare, 2018.
-
Abstract
- We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors' geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ℓ1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties
- Subjects :
- FOS: Psychology
170203 Knowledge Representation and Machine Learning
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....e3d879c1db1d8cf791dab1480f818d68
- Full Text :
- https://doi.org/10.1184/r1/6475556.v1