1. Intrinsic Dimensional Correlation Discretization for Mining Task
- Author
-
Hong Wen Song, Yu Sang, and Jun Zhao
- Subjects
Discretization ,business.industry ,Heuristic ,Pattern recognition ,General Medicine ,Dimension (vector space) ,Principal component analysis ,Artificial intelligence ,business ,Focus (optics) ,Eigenvalues and eigenvectors ,Discretization of continuous features ,Curse of dimensionality ,Mathematics - Abstract
Discretization is a necessary pre-processing step of the mining task, and a way of performance improvement for many machine learning algorithms. Existing techniques mainly focus on 1-dimension discretization in lower dimensional data space. In this paper, we present an intrinsic dimensional correlation discretization technique in high-dimensional data space. The approach estimates the intrinsic dimensionality (ID) of the data by using maximum likelihood estimation (MLE). Further, we project data onto eigenspace of the estimated lower ID by using principle component analysis (PCA) that can discover the potential correlation structure in the multivariate data. Thus, all the dimensions of the data can be transformed into new independent eigenspace of the ID, and each dimension can be discretized separately in the eigenspace based on the promising Bayes discretization model by using outstanding MODL discretization method. We design a heuristic framework to find better discretization scheme. Our approach demonstrates that there is a significantly improvement on the mean learning accuracy of the classifiers than traditional discretization methods.
- Published
- 2013