1. ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis
- Author
-
de la Rosa, Javier, Pozo, Álvaro Pérez, Ros, Salvador, and González-Blanco, Elena
- Subjects
Computer Science - Computation and Language - Abstract
The computational analysis of poetry is limited by the scarcity of tools to automatically analyze and scan poems. In a multilingual settings, the problem is exacerbated as scansion and rhyme systems only exist for individual languages, making comparative studies very challenging and time consuming. In this work, we present \textsc{Alberti}, the first multilingual pre-trained large language model for poetry. Through domain-specific pre-training (DSP), we further trained multilingual BERT on a corpus of over 12 million verses from 12 languages. We evaluated its performance on two structural poetry tasks: Spanish stanza type classification, and metrical pattern prediction for Spanish, English and German. In both cases, \textsc{Alberti} outperforms multilingual BERT and other transformers-based models of similar sizes, and even achieves state-of-the-art results for German when compared to rule-based systems, demonstrating the feasibility and effectiveness of DSP in the poetry domain., Comment: Accepted for publication at SEPLN 2023: 39th International Conference of the Spanish Society for Natural Language Processing
- Published
- 2023