1. Building a Production-Ready Multi-Label Classifier for Legal Documents with Digital-Twin-Distiller.
- Author
-
Csányi, Gergely Márk, Vági, Renátó, Nagy, Dániel, Üveges, István, Vadász, János Pál, Megyeri, Andrea, and Orosz, Tamás
- Subjects
LEGAL documents ,MACHINE learning ,LEGAL judgments ,MACHINE performance - Abstract
One of the most time-consuming parts of an attorney's job is finding similar legal cases. Categorization of legal documents by their subject matter can significantly increase the discoverability of digitalized court decisions. This is a multi-label classification problem, where each relatively long text can fit into more than one legal category. The proposed paper shows a solution where this multi-label classification problem is decomposed into more than a hundred binary classification problems. Several approaches have been tested, including different machine-learning and text-augmentation techniques to produce a practically applicable model. The proposed models and the methodologies were encapsulated and deployed as a digital-twin into a production environment. The performance of the created machine learning-based application reaches and could also improve the human-experts performance on this monotonous and labor-intensive task. It could increase the e-discoverability of the documents by about 50%. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF