Back to Search Start Over

Summarization of Lengthy Legal Documents via Abstractive Dataset Building: An Extract-then-Assign Approach.

Authors :
Jain, Deepali
Borah, Malaya Dutta
Biswas, Anupam
Source :
Expert Systems with Applications. Mar2024:Part B, Vol. 237, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Development of effective automatic summarization approaches for legal documents suffer from several challenges like extremely long document-summary pairs, lack of large scale training datasets with tractable document-summary token lengths, etc. In this work, we deal with the problem of legal document summarization by building a modified abstractive dataset from the original dataset. This ensures that the length of each document-summary pair is manageable and can be processed by the state of the art summarization approaches (such as BART). Secondly, we deal with the data scarcity problem by creating more number of training samples, from each of the original document-summary pair. This is done by creating multiple extractive summaries from each sample in the original dataset, following which ground-truth summary sentences are assigned to each of the extractive summary to generate new training samples. This results in a larger training dataset that can be utilized for fine-tuning summarization models. Our proposed approach has been evaluated on two different legal datasets- BillSum and Forum of Information Retrieval Evaluation (FIRE). With respect to the ROUGE metrics, the proposed approach is able to outperform pre-trained BART model fine-tuned on original dataset by (3 − 8) % for FIRE test sets, and by (1 − 3) % for the BillSum test sets. Considering the BERTScore metrics, the proposed approach obtains (1 − 2) % improvements on the FIRE test sets, while for the BillSum test sets (3 − 8) % improvements are observed. Such improvements suggest that the proposed dataset building approach can help achieve improved abstractive summarization of lengthy legal documents. • Lengthy nature and data scarcity are the two main challenges with legal documents. • Lengthy legal document summarization is handled by the proposed approach. • The data scarcity problem is handled by creating a feasible abstractive dataset. • A novel Extract-Then-Assign (ETA) approach is proposed. • ETA approach can greatly improve abstractive summarization of legal documents. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
237
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
173609346
Full Text :
https://doi.org/10.1016/j.eswa.2023.121571