Back to Search
Start Over
Context-Dependent Feature Values in Text Categorization
- Source :
- International Journal of Software Engineering and Knowledge Engineering. 30:1199-1219
- Publication Year :
- 2020
- Publisher :
- World Scientific Pub Co Pte Lt, 2020.
-
Abstract
- Feature engineering is one aspect of knowledge engineering. Besides feature selection, the appropriate assignment of feature values is also crucial to the performance of many software applications, such as text categorization (TC) and speech recognition. In this work, we develop a general method to enhance TC performance by the use of context-dependent feature values (aka term weights), which are obtained by a novel adaptation of a context-dependent adjustment procedure previously shown to be effective in information retrieval. The motivation of our approach is that the general method can be used with different text representations and in combination of other TC techniques. Experiments on several test collections show that our context-dependent feature values can improve TC over traditional context-independent unigram feature values, using a strong classifier like Support Vector Machine (SVM), which past works have found to be hard to improve. We also show that the relative performance improvement of our method over the context-independent baseline is comparable to the levels attained by recent word embedding methods in the literature, while an advantage of our approach is that it does not require the substantial training needed to learn word embedding representations.
- Subjects :
- Feature engineering
Computer Networks and Communications
business.industry
Computer science
Knowledge engineering
020207 software engineering
Feature selection
Context (language use)
02 engineering and technology
computer.software_genre
Computer Graphics and Computer-Aided Design
Software
Text categorization
Artificial Intelligence
Feature (computer vision)
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Artificial intelligence
business
computer
Natural language processing
Subjects
Details
- ISSN :
- 17936403 and 02181940
- Volume :
- 30
- Database :
- OpenAIRE
- Journal :
- International Journal of Software Engineering and Knowledge Engineering
- Accession number :
- edsair.doi...........ed962bf28a1559f7650147d350278668
- Full Text :
- https://doi.org/10.1142/s021819402050031x