1. Research on Small Sample Text Classification Based on Attribute Extraction and Data Augmentation
- Author
-
Yin Cong-jue, Ni Qing-chao, and Zhao Dong-hua
- Subjects
Small data ,business.industry ,Computer science ,Deep learning ,Sample (statistics) ,Semantics ,computer.software_genre ,Data modeling ,Task (project management) ,Data set ,Artificial intelligence ,Data mining ,business ,Encoder ,computer - Abstract
With the development of deep learning and the progress of natural language processing technology, as well as the continuous disclosure of judicial data such as judicial documents, legal intelligence has gradually become a research hot spot. The crime classification task is an important branch of text classification, which can help people related to the law to improve their work efficiency. However, in the actual research, the sample data is small and the distribution of crime categories is not balanced. To solve these two problems, BERT was used as the encoder to solve the problem of small data volume, and attribute extraction network was added to solve the problem of unbalanced distribution. Finally, the accuracy of 90.35% on small sample data set could be achieved, and F1 value was 67.62, which was close to the best model performance under sufficient data. Finally, a text enhancement method based on back-translation technology is proposed. Different models are used to conduct experiments. Finally, it is found that LSTM model is improved to some extent, but BERT is not improved to some extent.
- Published
- 2021