Back to Search
Start Over
BookWorm: A Dataset for Character Description and Analysis
- Publication Year :
- 2024
-
Abstract
- Characters are at the heart of every story, driving the plot and engaging readers. In this study, we explore the understanding of characters in full-length books, which contain complex narratives and numerous interacting characters. We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation, including character development, personality, and social context. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Using this dataset, we evaluate state-of-the-art long-context models in zero-shot and fine-tuning settings, utilizing both retrieval-based and hierarchical processing for book-length inputs. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks. Additionally, fine-tuned models using coreference-based retrieval produce the most factual descriptions, as measured by fact- and entailment-based metrics. We hope our dataset, experiments, and analysis will inspire further research in character-based narrative understanding.<br />Comment: 30 pages, 2 figures, EMNLP 2024 Findings
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2410.10372
- Document Type :
- Working Paper