Start Over

FinBen: A Holistic Financial Benchmark for Large Language Models

Authors :: Xie, Qianqian
Han, Weiguang
Chen, Zhengyu
Xiang, Ruoyu
Zhang, Xiao
He, Yueru
Xiao, Mengxi
Li, Dong
Dai, Yongfu
Feng, Duanyu
Xu, Yijing
Kang, Haoqiang
Kuang, Ziyan
Yuan, Chenhan
Yang, Kailai
Luo, Zheheng
Zhang, Tianlin
Liu, Zhiwei
Xiong, Guojun
Deng, Zhiyang
Jiang, Yuechen
Yao, Zhiyuan
Li, Haohang
Yu, Yangyang
Hu, Gang
Huang, Jiajia
Liu, Xiao-Yang
Lopez-Lira, Alejandro
Wang, Benyou
Lai, Yanzhao
Wang, Hao
Peng, Min
Ananiadou, Sophia
Huang, Jimin
Publication Year :: 2024
Abstract: LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, and decision-making. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading. Our evaluation of 15 representative LLMs, including GPT-4, ChatGPT, and the latest Gemini, reveals several key findings: While LLMs excel in IE and textual analysis, they struggle with advanced reasoning and complex tasks like text generation and forecasting. GPT-4 excels in IE and stock trading, while Gemini is better at text generation and forecasting. Instruction-tuned LLMs improve textual analysis but offer limited benefits for complex tasks such as QA. FinBen has been used to host the first financial LLMs shared task at the FinNLP-AgentScen workshop during IJCAI-2024, attracting 12 teams. Their novel solutions outperformed GPT-4, showcasing FinBen's potential to drive innovation in financial LLMs. All datasets, results, and codes are released for the research community: https://github.com/The-FinAI/PIXIU.<br />Comment: 26 pages, 11 figures