Back to Search
Start Over
Semi-supervised document clustering with dual supervision through seeding
- Source :
- SAC
- Publication Year :
- 2012
- Publisher :
- ACM, 2012.
-
Abstract
- Semi-supervised clustering algorithms for general problems use a small amount of labeled instances or pairwise instance constraints to aid the unsupervised clustering. However, user supervision can also be provided in alternative forms for document clustering, such as labeling a feature by associating it with a document or a cluster. Besides labeled documents, this paper also explores labeled features to generate cluster seeds to seed the unsupervised clustering. In this paper, we present a unified framework in which one can use both labeled documents and features in terms of seeding clusters and refine this information using intermediate clusters. We introduce two methods of using labeled features to generate cluster seeds. Experimental results on several real-world data sets demonstrate that constraining the clustering by both documents and features seeding can significantly improve document clustering performance over random seeding and document only seeding.
- Subjects :
- Clustering high-dimensional data
Fuzzy clustering
Brown clustering
Computer science
business.industry
Correlation clustering
Conceptual clustering
Pattern recognition
Document clustering
computer.software_genre
Biclustering
Data set
ComputingMethodologies_PATTERNRECOGNITION
Canopy clustering algorithm
FLAME clustering
Artificial intelligence
Data mining
Cluster analysis
business
computer
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the 27th Annual ACM Symposium on Applied Computing
- Accession number :
- edsair.doi...........a9971091124ee28d5a03b0d6afe4c3fd
- Full Text :
- https://doi.org/10.1145/2245276.2245306