1. TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
- Author
-
Zhang, Xiaokang, Zhang, Jing, Ma, Zeyao, Li, Yang, Zhang, Bohan, Li, Guanlin, Yao, Zijun, Xu, Kangli, Zhou, Jinchang, Zhang-Li, Daniel, Yu, Jifan, Zhao, Shu, Li, Juanzi, Tang, Jie, Zhang, Xiaokang, Zhang, Jing, Ma, Zeyao, Li, Yang, Zhang, Bohan, Li, Guanlin, Yao, Zijun, Xu, Kangli, Zhou, Jinchang, Zhang-Li, Daniel, Yu, Jifan, Zhao, Shu, Li, Juanzi, and Tang, Jie
- Abstract
We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios. We propose a distant supervision method for training, which comprises a reasoning process extension strategy, aiding in training LLMs to understand reasoning patterns more effectively as well as a cross-way validation strategy, ensuring the quality of the automatically generated data. To evaluate the performance of TableLLM, we have crafted a benchmark tailored to address both document and spreadsheet formats as well as constructed a well-organized evaluation pipeline capable of handling both scenarios. Thorough evaluations underscore the advantages of TableLLM when compared to various existing general-purpose and tabular data-focused LLMs. We have publicly released the model checkpoint, source code, benchmarks, and a web application for user interaction.Our codes and data are publicly available at https://github.com/TableLLM/TableLLM., Comment: https://tablellm.github.io
- Published
- 2024