Back to Search Start Over

DT-grams: Structured Dependency Grammar Stylometry for Cross-Language Authorship Attribution

Authors :
Murauer, Benjamin
Specht, Günther
Publication Year :
2021

Abstract

Cross-language authorship attribution problems rely on either translation to enable the use of single-language features, or language-independent feature extraction methods. Until recently, the lack of datasets for this problem hindered the development of the latter, and single-language solutions were performed on machine-translated corpora. In this paper, we present a novel language-independent feature for authorship analysis based on dependency graphs and universal part of speech tags, called DT-grams (dependency tree grams), which are constructed by selecting specific sub-parts of the dependency graph of sentences. We evaluate DT-grams by performing cross-language authorship attribution on untranslated datasets of bilingual authors, showing that, on average, they achieve a macro-averaged F1 score of 0.081 higher than previous methods across five different language pairs. Additionally, by providing results for a diverse set of features for comparison, we provide a baseline on the previously undocumented task of untranslated cross-language authorship attribution.<br />Comment: To be published in: "32. GI-Workshop Grundlagen von Datenbanken"

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2106.05677
Document Type :
Working Paper