1. AnchiBERT: A Pre-Trained Model for Ancient Chinese Language Understanding and Generation
- Author
-
Dayiheng Liu, Jiancheng Lv, Huishuang Tian, and Kexin Yang
- Subjects
Vocabulary ,Poetry ,business.industry ,Computer science ,media_common.quotation_subject ,computer.software_genre ,Chinese culture ,Data modeling ,Task analysis ,Artificial intelligence ,Couplet ,Language model ,Architecture ,business ,computer ,Natural language processing ,media_common - Abstract
Ancient Chinese is the essence of Chinese culture. There are several natural language processing tasks of ancient Chinese domain, such as ancient-modern Chinese translation, poem generation, and couplet generation. Previous studies usually use the supervised models which deeply rely on parallel data. However, it is difficult to obtain large-scale parallel data of ancient Chinese. In order to make full use of the more easily available monolingual ancient Chinese corpora, we release An-chiBERT, a pre-trained language model based on the architecture of BERT, which is trained on large-scale ancient Chinese corpora. We evaluate AnchiBERT on both language understanding and generation tasks, including poem classification, ancient-modern Chinese translation, poem generation, and couplet generation. The experimental results show that AnchiBERT outperforms BERT as well as the non-pretrained models and achieves state-of - the-art results in all cases.
- Published
- 2021
- Full Text
- View/download PDF