Back to Search
Start Over
A New Semi-supervised Dimension Reduction Technique for Textual Data Analysis.
- Source :
- Intelligent Data Engineering & Automated Learning - IDEAL 2006; 2006, p654-662, 9p
- Publication Year :
- 2006
-
Abstract
- Dimension reduction techniques are important preprocessing algorithms for high dimensional applications that reduce the noise keeping the main structure of the dataset. They have been successfully applied to a large variety of problems and particularly in text mining applications. However, the algorithms proposed in the literature often suffer from a low discriminant power due to its unsupervised nature and to the ‘curse of dimensionality'. Fortunately several search engines such as Yahoo provide a manually created classification of a subset of documents that may be exploited to overcome this problem. In this paper we propose a semi-supervised version of a PCA like algorithm for textual data analysis. The new method reduces the term space dimensionality taking advantage of this document classification. The proposed algorithm has been evaluated using a text mining problem and it outperforms well known unsupervised techniques. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISBNs :
- 9783540454854
- Database :
- Complementary Index
- Journal :
- Intelligent Data Engineering & Automated Learning - IDEAL 2006
- Publication Type :
- Book
- Accession number :
- 32914208
- Full Text :
- https://doi.org/10.1007/11875581_79