Back to Search
Start Over
Enhanced TextNetTopics for Text Classification Using the G-S-M Approach with Filtered fastText-Based LDA Topics and RF-Based Topic Scoring: fasTNT.
- Source :
- Applied Sciences (2076-3417); Oct2024, Vol. 14 Issue 19, p8914, 24p
- Publication Year :
- 2024
-
Abstract
- TextNetTopics is a novel topic modeling-based topic selection approach that finds highly ranked discriminative topics for training text classification models, where a topic is a set of semantically related words. However, it suffers from several limitations, including the retention of redundant or irrelevant features within topics, a computationally intensive topic-scoring mechanism, and a lack of explicit semantic modeling. In order to address these shortcomings, this paper proposes fasTNT, an enhanced version of TextNetTopics grounded in the Grouping–Scoring–Modeling approach. FasTNT aims to improve the topic selection process by preserving only informative features within topics, reforming LDA topics using fastText word embeddings, and introducing an efficient scoring method that considers topic interactions using Random Forest feature importance. Experimental results on four diverse datasets demonstrate that fasTNT outperforms the original TextNetTopics method in classification performance and feature reduction. [ABSTRACT FROM AUTHOR]
- Subjects :
- FEATURE selection
RANDOM forest algorithms
MACHINE learning
CLASSIFICATION
SEMANTICS
Subjects
Details
- Language :
- English
- ISSN :
- 20763417
- Volume :
- 14
- Issue :
- 19
- Database :
- Complementary Index
- Journal :
- Applied Sciences (2076-3417)
- Publication Type :
- Academic Journal
- Accession number :
- 180273531
- Full Text :
- https://doi.org/10.3390/app14198914