Back to Search Start Over

Mining the Frequent Patterns of Named Entities for Long Document Classification

Authors :
Bohan Wang
Rui Qi
Jinhua Gao
Jianwei Zhang
Xiaoguang Yuan
Wenjun Ke
Source :
Applied Sciences, Vol 12, Iss 5, p 2544 (2022)
Publication Year :
2022
Publisher :
MDPI AG, 2022.

Abstract

Nowadays, a large amount of information is stored as text, and numerous text mining techniques have been developed for various applications, such as event detection, news topic classification, public opinion detection, and sentiment analysis. Although significant progress has been achieved for short text classification, document-level text classification requires further exploration. Long documents always contain irrelevant noisy information that shelters the prominence of indicative features, limiting the interpretability of classification results. To alleviate this problem, a model called MIPELD (mining the frequent pattern of a named entity for long document classification) for long document classification is demonstrated, which mines the frequent patterns of named entities as features. Discovered patterns allow semantic generalization among documents and provide clues for verifying the results. Experiments on several datasets resulted in good accuracy and marco-F1 values, meeting the requirements for practical application. Further analysis validated the effectiveness of MIPELD in mining interpretable information in text classification.

Details

Language :
English
ISSN :
20763417
Volume :
12
Issue :
5
Database :
Directory of Open Access Journals
Journal :
Applied Sciences
Publication Type :
Academic Journal
Accession number :
edsdoj.86afc0c798e348a0b2c56e39d05fe97a
Document Type :
article
Full Text :
https://doi.org/10.3390/app12052544