Back to Search Start Over

A New Semi-supervised Dimension Reduction Technique for Textual Data Analysis.

Authors :
Corchado, Emilio
Yin, Hujun
Botti, Vicente
Fyfe, Colin
Martín-Merino, Manuel
Román, Jesus
Source :
Intelligent Data Engineering & Automated Learning - IDEAL 2006; 2006, p654-662, 9p
Publication Year :
2006

Abstract

Dimension reduction techniques are important preprocessing algorithms for high dimensional applications that reduce the noise keeping the main structure of the dataset. They have been successfully applied to a large variety of problems and particularly in text mining applications. However, the algorithms proposed in the literature often suffer from a low discriminant power due to its unsupervised nature and to the ‘curse of dimensionality'. Fortunately several search engines such as Yahoo provide a manually created classification of a subset of documents that may be exploited to overcome this problem. In this paper we propose a semi-supervised version of a PCA like algorithm for textual data analysis. The new method reduces the term space dimensionality taking advantage of this document classification. The proposed algorithm has been evaluated using a text mining problem and it outperforms well known unsupervised techniques. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783540454854
Database :
Complementary Index
Journal :
Intelligent Data Engineering & Automated Learning - IDEAL 2006
Publication Type :
Book
Accession number :
32914208
Full Text :
https://doi.org/10.1007/11875581_79