1. Leveraging BERT for extractive text summarization on federal police documents.
- Author
-
Barros, Thierry S., Pires, Carlos Eduardo S., and Nascimento, Dimas Cassimiro
- Subjects
LANGUAGE models ,TEXT summarization ,ARTIFICIAL neural networks ,NATURAL language processing ,CRIMINAL investigation - Abstract
A document known as notitia criminis (NC) is use in the Brazilian Federal Police as the starting point of the criminal investigation. An NC aims to report a summary of investigative activities. Thus, it contains all relevant information about a supposed crime that occurred. To manage an inquiry and correlate similar investigations, the Federal Police usually needs to extract essential information from an NC document. The manual extraction (reading and understanding the entire content) may be human mentally exhausting, due to the size and complexity of the documents. In this light, natural language processing (NLP) techniques are commonly used for automatic information extraction from textual documents. Deep neural networks are successfully apply to many different NLP tasks. A neural network model that leveraged the results in a wide range of NLP tasks was the BERT model—an acronym for Bidirectional Encoder Representations from Transformers. In this article, we propose approaches based on the BERT model to extract relevant information from textual documents using automatic text summarization techniques. In other words, we aim to analyze the feasibility of using the BERT model to extract and synthesize the most essential information of an NC document. We evaluate the performance of the proposed approaches using two real-world datasets: the Federal Police dataset (a private domain dataset) and the Brazilian WikiHow dataset (a public domain dataset). Experimental results using different variants of the ROUGE metric show that our approaches can significantly increase extractive text summarization effectiveness without sacrificing efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF