Start Over

Summarization of Lengthy Legal Documents via Abstractive Dataset Building: An Extract-then-Assign Approach.

Authors :: Jain, Deepali
Borah, Malaya Dutta
Biswas, Anupam
Source :: Expert Systems with Applications. Mar2024:Part B, Vol. 237, pN.PAG-N.PAG. 1p.
Publication Year :: 2024
Abstract: Development of effective automatic summarization approaches for legal documents suffer from several challenges like extremely long document-summary pairs, lack of large scale training datasets with tractable document-summary token lengths, etc. In this work, we deal with the problem of legal document summarization by building a modified abstractive dataset from the original dataset. This ensures that the length of each document-summary pair is manageable and can be processed by the state of the art summarization approaches (such as BART). Secondly, we deal with the data scarcity problem by creating more number of training samples, from each of the original document-summary pair. This is done by creating multiple extractive summaries from each sample in the original dataset, following which ground-truth summary sentences are assigned to each of the extractive summary to generate new training samples. This results in a larger training dataset that can be utilized for fine-tuning summarization models. Our proposed approach has been evaluated on two different legal datasets- BillSum and Forum of Information Retrieval Evaluation (FIRE). With respect to the ROUGE metrics, the proposed approach is able to outperform pre-trained BART model fine-tuned on original dataset by (3 − 8) % for FIRE test sets, and by (1 − 3) % for the BillSum test sets. Considering the BERTScore metrics, the proposed approach obtains (1 − 2) % improvements on the FIRE test sets, while for the BillSum test sets (3 − 8) % improvements are observed. Such improvements suggest that the proposed dataset building approach can help achieve improved abstractive summarization of lengthy legal documents. • Lengthy nature and data scarcity are the two main challenges with legal documents. • Lengthy legal document summarization is handled by the proposed approach. • The data scarcity problem is handled by creating a feasible abstractive dataset. • A novel Extract-Then-Assign (ETA) approach is proposed. • ETA approach can greatly improve abstractive summarization of legal documents. [ABSTRACT FROM AUTHOR]

Subjects :: *LEGAL documents
*TEXT summarization
*AUTOMATIC summarization
*FIRE testing
*INFORMATION retrieval

Details

Language :: English
ISSN :: 09574174
Volume :: 237
Database :: Academic Search Index
Journal :: Expert Systems with Applications
Publication Type :: Academic Journal
Accession number :: 173609346
Full Text :: https://doi.org/10.1016/j.eswa.2023.121571

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Summarization of Lengthy Legal Documents via Abstractive Dataset Building: An Extract-then-Assign Approach.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Summarization of Lengthy Legal Documents via Abstractive Dataset Building: An Extract-then-Assign Approach.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources