6 results on '"Park, Haesun"'
Search Results
2. IDR/QR: an incremental dimension reduction algorithm via QR decomposition
- Author
-
Ye, Jieping, Li, Qi, Xiong, Hui, Park, Haesun, Janardan, Ravi, and Kumar, Vipin
- Subjects
Electronic data processing ,Data mining ,Algorithms ,Data warehousing/data mining ,Algorithm ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the literature, a well-known dimension reduction algorithm is Linear Discriminant Analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of Singular Value Decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental dimension reduction algorithm, called IDR/QR, which applies QR Decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically. Index Terms--Dimension reduction, linear discriminant analysis, incremental learning, QR Decomposition, Singular Value Decomposition (SVD).
- Published
- 2005
3. MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization.
- Author
-
Kannan, Ramakrishnan, Ballard, Grey, and Park, Haesun
- Subjects
MATRICES (Mathematics) ,NUMERICAL analysis ,MATHEMATICAL analysis ,DATA mining ,ALGORITHMS - Abstract
Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors \mathbfW
, for the given input matrix \mathbfA , such that \mathbfA\approx \mathbfW\mathbfH and \mathbfH- Published
- 2018
- Full Text
- View/download PDF
4. iVisClustering: An Interactive Visual Document Clustering via Topic Modeling.
- Author
-
Lee, Hanseung, Kihm, Jaeyeon, Choo, Jaegul, Stasko, John, and Park, Haesun
- Subjects
INFORMATION storage & retrieval systems ,HUMAN-machine systems ,DATABASE management ,DATA mining ,CLUSTER analysis (Statistics) ,DATA analysis ,DATA visualization - Abstract
Clustering plays an important role in many large-scale data analyses providing users with an overall understanding of their data. Nonetheless, clustering is not an easy task due to noisy features and outliers existing in the data, and thus the clustering results obtained from automatic algorithms often do not make clear sense. To remedy this problem, automatic clustering should be complemented with interactive visualization strategies. This paper proposes an interactive visual analytics system for document clustering, called iVisClustering, based on a widely-used topic modeling method, latent Dirichlet allocation (LDA). iVisClustering provides a summary of each cluster in terms of its most representative keywords and visualizes soft clustering results in parallel coordinates. The main view of the system provides a 2D plot that visualizes cluster similarities and the relation among data items with a graph-based representation. iVisClustering provides several other views, which contain useful interaction methods. With help of these visualization modules, we can interactively refine the clustering results in various ways. Keywords can be adjusted so that they characterize each cluster better. In addition, our system can filter out noisy data and re-cluster the data accordingly. Cluster hierarchy can be constructed using a tree structure and for this purpose, the system supports cluster-level interactions such as sub-clustering, removing unimportant clusters, merging the clusters that have similar meanings, and moving certain clusters to any other node in the tree structure. Furthermore, the system provides document-level interactions such as moving mis-clustered documents to another cluster and removing useless documents. Finally, we present how interactive clustering is performed via iVisClustering by using real-world document data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
5. Feature Reduction via Generalized Uncorrelated Linear Discriminant Analysis.
- Author
-
Jieping Ye, Janardan, Ravi, Qi Li, and Park, Haesun
- Subjects
INFORMATION resources management ,DATA mining ,MACHINE learning ,BIOINFORMATICS ,STATISTICAL correlation ,SEARCH engines ,ONLINE data processing ,ARTIFICIAL intelligence ,INFORMATION science - Abstract
High-dimensional data appear in many applications of data mining, machine learning, and bioinformatics. Feature reduction is commonly applied as a preprocessing step to overcome the curse of dimensionality. Uncorrelated Linear Discriminant Analysis (ULDA) was recently proposed for feature reduction. The extracted features via ULDA were shown to be statistically uncorrelated, which is desirable for many applications. In this paper, an algorithm called ULDA/QR is proposed to simplify the previous implementation of ULDA. Then, the ULDA/GSVD algorithm is proposed, based on a novel optimization criterion, to address the singularity problem which occurs in undersampled problems, where the data dimension is larger than the sample size. The criterion used is the regularized version of the one in ULDA/QR. Surprisingly, our theoretical result shows that the solution to ULDA/GSVD is independent of the value of the regularization parameter. Experimental results on various types of data sets are reported to show the effectiveness of the proposed algorithm and to compare it with other commonly used feature reduction algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
6. IDR/QR: An Incremental Dimension Reduction Algorithm via QR Decomposition.
- Author
-
Jieping Ye, Qi Li, Hui Xiong, Park, Haesun, Janardan, Ravi, and Kumar, Vipin
- Subjects
ELECTRONIC data processing ,DATA mining ,DATABASE searching ,ALGORITHMS ,EIGENVALUES ,KNOWLEDGE management - Abstract
Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the literature, a well-known dimension reduction algorithm is Linear Discriminant Analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of Singular Value Decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental dimension reduction algorithm, called IDR/QR, which applies OR Decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the DR/OR algorithm can constrain the computational cost by applying efficient OR-updating techniques. Finally, we evaluate the effectiveness of the IDA/OR algorithm in terms of classification error rate on the reduced dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.