Back to Search Start Over

Evolving a Weighted Combination of Text Similarities for Authorship Attribution

Authors :
Youssef Keyrouz
Dany Mezher
Cyril Fonlupt
Rafic Faddoul
Denis Robilliard
Source :
Lecture Notes in Computer Science ISBN: 9783030457143
Publication Year :
2020
Publisher :
Springer International Publishing, 2020.

Abstract

Authorship Attribution (AA) also known as Authorship Identification is the problem of identifying the author of an anonymous text based on its characteristics or features. Among notable features extraction methods used to this end, one can cite, the bag of words methods (BOW) and the semantic and syntactic methods (SSM). BOW methods consider the text as a sequence of tokens and disregard the semantics of the language, whereas SSM rely on advanced natural language processing (NLP) techniques. The features extracted from an anonymous text are compared to features extracted from a corpus of texts written by known authors using several similarity measures. In this paper, we combine multiple results generated using conventional methods (chosen from the literature) and we use a genetic algorithm (GA) to find the optimal weighting distribution. The optimal combination obtained by the GA is then applied, and the author attributed to the anonymous text is selected among a set of known authors based on the highest similarity. The fitness of our GA is the resulting accuracy of the authorship attribution task. A numerical application on a corpus consisting of 3036 books written by 142 authors shows that the proposed method has higher accuracy than conventional methods and achieved satisfying performance.

Details

ISBN :
978-3-030-45714-3
ISBNs :
9783030457143
Database :
OpenAIRE
Journal :
Lecture Notes in Computer Science ISBN: 9783030457143
Accession number :
edsair.doi...........5c7da62c417021af09ddcd38f2e98765