1. Combinatorial Text Classification: the Effect of Multi-Parameterized Correlation Clustering
- Author
-
Jikang Chen, Joseph R. Barr, Peter Shaw, and Faisal N. Abu-Khzam
- Subjects
Topic model ,Service (systems architecture) ,Basis (linear algebra) ,business.industry ,Computer science ,Correlation clustering ,Parameterized complexity ,computer.software_genre ,Chaining ,Word2vec ,Artificial intelligence ,Heuristics ,business ,computer ,Natural language processing - Abstract
The paper demonstrates the potential of chaining two distinct methodologies in service of topic modelling. The first, as of recent years, is more-or-less standard natural language processing (NLP) with word2vec; the second is graph-theoretical or combinatorial algorithm. Together, we show how they may be used to help classify documents into distinct, but perhaps not disjointed, classes. The procedure is demonstrated on a collection of Twitter feeds, or tweets. Heuristics is the basis for this procedure; it is not presumed to perfectly work in every situation, or for every input, and, in fact, the authors believe that the procedure will yield better results in a more homogeneous corpora written in some standardized fashion, as written in, e.g., legal or medical documents.
- Published
- 2019
- Full Text
- View/download PDF