Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Authors :: Peng, Bo
Goldstein, Daniel
Anthony, Quentin
Albalak, Alon
Alcaide, Eric
Biderman, Stella
Cheah, Eugene
Du, Xingjian
Ferdinan, Teddy
Hou, Haowen
Kazienko, Przemysław
GV, Kranthi Kiran
Kocoń, Jan
Koptyra, Bartłomiej
Krishna, Satyapriya
McClelland Jr., Ronald
Lin, Jiaju
Muennighoff, Niklas
Obeid, Fares
Saito, Atsushi
Song, Guangyu
Tu, Haoqin
Wirawan, Cahya
Woźniak, Stanisław
Zhang, Ruichong
Zhao, Bingchen
Zhao, Qihang
Zhou, Peng
Zhu, Jian
Zhu, Rui-Jie
Publication Year :: 2024
Abstract: We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer