Back to Search
Start Over
Text Categorization by Weighted Features
- Source :
- 2018 5th International Conference on Information Science and Control Engineering (ICISCE).
- Publication Year :
- 2018
- Publisher :
- IEEE, 2018.
-
Abstract
- Text representation, which represent texts of variable lengths into feature vectors, is a fundamental task in machine learning. Early approaches extract lexical and syntactic features, which are discrete and sparse and can hardly capture the semantic relatedness between words. Recent advances in deep learning, which represents text segments into dense and continuous vectors, has shed light on this problem. However, the main limitation is they are usually based on complex neural network structure, which are resource-consuming to train and make inference. To remedy this, we propose a novel text representation approach which considers word local and global importance via the BM25 weighing schema. We use word vectors pretrained from large text corpus to capture the latent semantic relatedness between words. Experimental results show that our approach is effective and efficient compared with existing feature-based, unsupervised and supervised baselines.
- Subjects :
- Text corpus
Artificial neural network
business.industry
Computer science
Deep learning
Feature vector
Feature extraction
Inference
02 engineering and technology
010501 environmental sciences
computer.software_genre
01 natural sciences
Semantic similarity
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Task analysis
Artificial intelligence
business
computer
Natural language processing
0105 earth and related environmental sciences
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2018 5th International Conference on Information Science and Control Engineering (ICISCE)
- Accession number :
- edsair.doi...........639d20b603949d35a9bccb0052d2629f
- Full Text :
- https://doi.org/10.1109/icisce.2018.00119