Back to Search Start Over

Financial Report Chunking for Effective Retrieval Augmented Generation

Authors :
Yepes, Antonio Jimeno
You, Yao
Milczek, Jan
Laverde, Sebastian
Li, Renyu
Publication Year :
2024

Abstract

Chunking information is a key step in Retrieval Augmented Generation (RAG). Current research primarily centers on paragraph-level chunking. This approach treats all texts as equal and neglects the information contained in the structure of documents. We propose an expanded approach to chunk documents by moving beyond mere paragraph-level chunking to chunk primary by structural element components of documents. Dissecting documents into these constituent elements creates a new way to chunk documents that yields the best chunk size without tuning. We introduce a novel framework that evaluates how chunking based on element types annotated by document understanding models contributes to the overall context and accuracy of the information retrieved. We also demonstrate how this approach impacts RAG assisted Question & Answer task performance. Our research includes a comprehensive analysis of various element types, their role in effective information retrieval, and the impact they have on the quality of RAG outputs. Findings support that element type based chunking largely improve RAG results on financial reporting. Through this research, we are also able to answer how to uncover highly accurate RAG.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2402.05131
Document Type :
Working Paper