Back to Search Start Over

Text Categorization by Weighted Features

Authors :
Fu Junfeng
Xin Zhou
Zheng Jinkun
Liang Liang
Source :
2018 5th International Conference on Information Science and Control Engineering (ICISCE).
Publication Year :
2018
Publisher :
IEEE, 2018.

Abstract

Text representation, which represent texts of variable lengths into feature vectors, is a fundamental task in machine learning. Early approaches extract lexical and syntactic features, which are discrete and sparse and can hardly capture the semantic relatedness between words. Recent advances in deep learning, which represents text segments into dense and continuous vectors, has shed light on this problem. However, the main limitation is they are usually based on complex neural network structure, which are resource-consuming to train and make inference. To remedy this, we propose a novel text representation approach which considers word local and global importance via the BM25 weighing schema. We use word vectors pretrained from large text corpus to capture the latent semantic relatedness between words. Experimental results show that our approach is effective and efficient compared with existing feature-based, unsupervised and supervised baselines.

Details

Database :
OpenAIRE
Journal :
2018 5th International Conference on Information Science and Control Engineering (ICISCE)
Accession number :
edsair.doi...........639d20b603949d35a9bccb0052d2629f
Full Text :
https://doi.org/10.1109/icisce.2018.00119