1. Short‐text feature expansion and classification based on nonnegative matrix factorization
- Author
-
Wenchao Jiang, Zhiming Zhao, Ling Zhang, and Multiscale Networked Systems (IvI, FNWI)
- Subjects
0209 industrial biotechnology ,Computer science ,Feature vector ,02 engineering and technology ,short text classification ,Theoretical Computer Science ,Matrix decomposition ,Non-negative matrix factorization ,Matrix (mathematics) ,020901 industrial engineering & automation ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Word2vec ,Cluster analysis ,feature extension ,business.industry ,Dimensionality reduction ,nonnegative matrix factorization ,Pattern recognition ,Human-Computer Interaction ,Feature (computer vision) ,correlation ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Software - Abstract
In this paper, a non‐negative matrix factorization feature expansion (NMFFE) approach was proposed to overcome the feature‐sparsity issue when expanding features of short‐text. First, we took the internal relationships of short texts and words into account when segmenting words from texts and constructing their relationship matrix. Second, we utilized the Dual regularization non‐negative matrix tri‐factorization (DNMTF) algorithm to obtain the words clustering indicator matrix, which was used to get the feature space by dimensionality reduction methods. Thirdly, words with close relationship were selected out from the feature space and added into the short‐text to solve the sparsity issue. The experimental results showed that the accuracy of short text classification of our NMFFE algorithm increased 25.77%, 10.89%, and 1.79% on three data sets: Web snippets, Twitter sports, and AGnews, respectively compared with the Word2Vec algorithm and Char‐CNN algorithm. It indicated that the NMFFE algorithm was better than the BOW algorithm and the Char‐CNN algorithm in terms of classification accuracy and algorithm robustness.
- Published
- 2022