Back to Search
Start Over
Clustering of documents from a two-way viewpoint
- Publication Year :
- 2010
- Publisher :
- LED Edizioni Universitarie di Lettere Economia e Diritto, 2010.
-
Abstract
- Methods for high-dimensional data clustering represents a prolific research area in data mining, encouraging a large quantity of provisional solutions. In text mining and in the analysis of gene expression data, the idea of bidimensional clustering arose, in the sense of finding clusters of documents characterized by cluster of terms (and analogously, clusters of genes and clusters of different experimental conditions). Although we are often more interested in clustering one way of our data structure, however co clustering seems to be convenient (both from an interpretative and a computational viewpoint). Here we try to frame the problem in a multidimensional data analysis perspective, both referring to classic association and/or prediction indexes for contingency tables. Following previous works, we propose the use of a predictability index, Goodman&Kruskal tb, dealing with documents-by-terms tables. After a quick review of the wide literature related to two-way clustering, mainly developed in microarray analysis, we propose a new algorithm belonging to the genetic family, based on the optimization of the predictability index tau-b. We present experimental results to show the effectiveness of our co-clustering algorithm in practice.
- Subjects :
- Genetic Algorithms
two-way clustering
Goodman&Kruskal tau-b
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.od......3730..12862e625530a5b9aaefd3805f1de5d4