1. Text Document Cluster Analysis Through Visualization of 3D Projections
- Author
-
Masaki Aono and Mei Kobayashi
- Subjects
Data set ,Set (abstract data type) ,Information retrieval ,Computer science ,Interface (Java) ,Benchmark (computing) ,NIST ,Data mining ,Document clustering ,computer.software_genre ,Cluster analysis ,computer ,Visualization - Abstract
Clustering has been used as a tool for understanding the content of large text document sets. As the volume of stored data has increased, so has the need for tools to understand output from clustering algorithms. We developed a new visual interface to meet this demand. Our interface helps non-technical users understand documents and clusters in massive databases (e.g., document content, cluster sizes, distances between clusters, similarities of documents within clusters, extent of cluster overlaps) and evaluate the quality of output from different clustering algorithms. When a user inputs a keyword query describing his/her interests, our system retrieves and displays documents and clusters in three dimensions. More specifically, given a set of documents modeled as vectors in an orthogonal coordinate system and a query, our system finds three orthogonal coordinate axes that are most relevant to generate a display (or users may choose any three orthogonal axes). We conducted implementation studies to demonstrate the value of our system with an artificial data set and a de facto benchmark news article dataset from the United States NIST Text REtrieval Competitions (TREC).
- Published
- 2014