Back to Search Start Over

BookWorm: A Dataset for Character Description and Analysis

Authors :
Papoudakis, Argyrios
Lapata, Mirella
Keller, Frank
Publication Year :
2024

Abstract

Characters are at the heart of every story, driving the plot and engaging readers. In this study, we explore the understanding of characters in full-length books, which contain complex narratives and numerous interacting characters. We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation, including character development, personality, and social context. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Using this dataset, we evaluate state-of-the-art long-context models in zero-shot and fine-tuning settings, utilizing both retrieval-based and hierarchical processing for book-length inputs. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks. Additionally, fine-tuned models using coreference-based retrieval produce the most factual descriptions, as measured by fact- and entailment-based metrics. We hope our dataset, experiments, and analysis will inspire further research in character-based narrative understanding.<br />Comment: 30 pages, 2 figures, EMNLP 2024 Findings

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2410.10372
Document Type :
Working Paper