Back to Search
Start Over
Rough set based hybrid algorithm for text classification
- Source :
- Expert Systems with Applications. 36:9168-9174
- Publication Year :
- 2009
- Publisher :
- Elsevier BV, 2009.
-
Abstract
- Automatic classification of text documents, one of essential techniques for Web mining, has always been a hot topic due to the explosive growth of digital documents available on-line. In text classification community, k-nearest neighbor (kNN) is a simple and yet effective classifier. However, as being a lazy learning method without premodelling, kNN has a high cost to classify new documents when training set is large. Rocchio algorithm is another well-known and widely used technique for text classification. One drawback of the Rocchio classifier is that it restricts the hypothesis space to the set of linear separable hyperplane regions. When the data does not fit its underlying assumption well, Rocchio classifier suffers. In this paper, a hybrid algorithm based on variable precision rough set is proposed to combine the strength of both kNN and Rocchio techniques and overcome their weaknesses. An experimental evaluation of different methods is carried out on two common text corpora, i.e., the Reuters-21578 collection and the 20-newsgroup collection. The experimental results indicate that the novel algorithm achieves significant performance improvement.
- Subjects :
- Text corpus
Rocchio algorithm
Training set
business.industry
Computer science
General Engineering
Pattern recognition
Machine learning
computer.software_genre
Computer Science Applications
ComputingMethodologies_PATTERNRECOGNITION
Lazy learning
Web mining
Hyperplane
Artificial Intelligence
Classifier (linguistics)
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Rough set
Artificial intelligence
business
computer
Classifier (UML)
Subjects
Details
- ISSN :
- 09574174
- Volume :
- 36
- Database :
- OpenAIRE
- Journal :
- Expert Systems with Applications
- Accession number :
- edsair.doi...........4b3d42141ff979688cc1ea6db032aa38
- Full Text :
- https://doi.org/10.1016/j.eswa.2008.12.026