1. Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
- Author
-
Liu, Chris Yuhao, Zeng, Liang, Liu, Jiacai, Yan, Rui, He, Jujie, Wang, Chaojie, Yan, Shuicheng, Liu, Yang, and Zhou, Yahui
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs -- significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.
- Published
- 2024