Back to Search
Start Over
Adapting Monolingual Models: Data can be Scarce when Language Similarity is High
- Source :
- Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
- Publication Year :
- 2021
-
Abstract
- For many (minority) languages, the resources needed to train large models are not available. We investigate the performance of zero-shot transfer learning with as little data as possible, and the influence of language similarity in this process. We retrain the lexical layers of four BERT-based models using data from two low-resource target language varieties, while the Transformer layers are independently fine-tuned on a POS-tagging task in the model's source language. By combining the new lexical layers and fine-tuned Transformer layers, we achieve high task performance for both target languages. With high language similarity, 10MB of data appears sufficient to achieve substantial monolingual transfer performance. Monolingual BERT-based models generally achieve higher downstream task performance after retraining the lexical layer than multilingual BERT, even when the target language is included in the multilingual model.<br />Comment: Findings of ACL 2021 Camera Ready
- Subjects :
- Computer Science - Computation and Language
Subjects
Details
- Database :
- arXiv
- Journal :
- Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
- Publication Type :
- Report
- Accession number :
- edsarx.2105.02855
- Document Type :
- Working Paper
- Full Text :
- https://doi.org/10.18653/v1/2021.findings-acl.433