1. LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models
- Author
-
Zhao, Liang, Wei, Tianwen, Zeng, Liang, Cheng, Cheng, Yang, Liu, Cheng, Peng, Wang, Lijie, Li, Chenxia, Wu, Xuejie, Zhu, Bo, Gan, Yimeng, Hu, Rui, Yan, Shuicheng, Fang, Han, and Zhou, Yahui
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
We introduce LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens. We provide a training recipe for efficiently extending context length of LLMs. We identify that the critical element in enhancing long-context processing capability is to incorporate a long-context SFT stage following the standard SFT stage. A mere 200 iterations can convert the standard SFT model into a long-context model. To reduce the effort in collecting and annotating data for long-context language modeling, we develop two novel methods for creating synthetic data. These methods are applied during the continual pretraining phase as well as the Supervised Fine-Tuning (SFT) phase, greatly enhancing the training efficiency of our long-context LLMs. Our findings suggest that synthetic long-context SFT data can surpass the performance of data curated by humans to some extent. LongSkywork achieves outstanding performance on a variety of long-context benchmarks. In the Needle test, a benchmark for long-context information retrieval, our models achieved perfect accuracy across multiple context spans. Moreover, in realistic application scenarios, LongSkywork-13B demonstrates performance on par with Claude2.1, the leading long-context model, underscoring the effectiveness of our proposed methods.
- Published
- 2024