Back to Search
Start Over
Evolving a Weighted Combination of Text Similarities for Authorship Attribution
- Source :
- Lecture Notes in Computer Science ISBN: 9783030457143
- Publication Year :
- 2020
- Publisher :
- Springer International Publishing, 2020.
-
Abstract
- Authorship Attribution (AA) also known as Authorship Identification is the problem of identifying the author of an anonymous text based on its characteristics or features. Among notable features extraction methods used to this end, one can cite, the bag of words methods (BOW) and the semantic and syntactic methods (SSM). BOW methods consider the text as a sequence of tokens and disregard the semantics of the language, whereas SSM rely on advanced natural language processing (NLP) techniques. The features extracted from an anonymous text are compared to features extracted from a corpus of texts written by known authors using several similarity measures. In this paper, we combine multiple results generated using conventional methods (chosen from the literature) and we use a genetic algorithm (GA) to find the optimal weighting distribution. The optimal combination obtained by the GA is then applied, and the author attributed to the anonymous text is selected among a set of known authors based on the highest similarity. The fitness of our GA is the resulting accuracy of the authorship attribution task. A numerical application on a corpus consisting of 3036 books written by 142 authors shows that the proposed method has higher accuracy than conventional methods and achieved satisfying performance.
Details
- ISBN :
- 978-3-030-45714-3
- ISBNs :
- 9783030457143
- Database :
- OpenAIRE
- Journal :
- Lecture Notes in Computer Science ISBN: 9783030457143
- Accession number :
- edsair.doi...........5c7da62c417021af09ddcd38f2e98765