Back to Search Start Over

Evaluating Transformers and Linguistic Features integration for Author Profiling tasks in Spanish.

Authors :
García-Díaz, José Antonio
Beydoun, Ghassan
Valencia-García, Rafel
Source :
Data & Knowledge Engineering. May2024, Vol. 151, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Author profiling consists of extracting their demographic and psychographic information by examining their writings. This information can then be used to improve the reader experience and to detect bots or propagators of hoaxes and/or hate speech. Therefore, author profiling can be applied to build more robust and efficient Knowledge-Based Systems for tasks such as content moderation, user profiling, and information retrieval. Author profiling is typically performed automatically as a document classification task. Recently, language models based on transformers have also proven to be quite effective in this task. However, the size and heterogeneity of novel language models, makes it necessary to evaluate them in context. The contributions we make in this paper are four-fold: First, we evaluate which language models are best suited to perform author profiling in Spanish. These experiments include basic, distilled, and multilingual models. Second, we evaluate how feature integration can improve performance for this task. We evaluate two distinct strategies: knowledge integration and ensemble learning. Third, we evaluate the ability of linguistic features to improve the interpretability of the results. Fourth, we evaluate the performance of each language model in terms of memory, training, and inference times. Our results indicate that the use of lightweight models can indeed achieve similar performance to heavy models and that multilingual models are actually less effective than models trained with one language. Finally, we confirm that the best models and strategies for integrating features ultimately depend on the context of the task. • Study of large language models for conducting Author Profiling in Spanish. • Feature integration improves the performance of large language models. • Interpretability of profiles using linguistic features. • Hyperlinks and hashtag are strong features when discerning between bots and humans. • Stylometry is the linguistic category most relevant in Author Profiling. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0169023X
Volume :
151
Database :
Academic Search Index
Journal :
Data & Knowledge Engineering
Publication Type :
Academic Journal
Accession number :
177313295
Full Text :
https://doi.org/10.1016/j.datak.2024.102307