Author: "Park, Haesun" / Topic: data mining - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Park, Haesun"' showing total 6 results

Start Over Author "Park, Haesun" Topic data mining

6 results on '"Park, Haesun"'

1. Integer Matrix Approximation and Data Mining

Author: Dong, Bo, Lin, Matthew M., and Park, Haesun
Published: 2018
Full Text: View/download PDF

2. IDR/QR: an incremental dimension reduction algorithm via QR decomposition

Author: Ye, Jieping, Li, Qi, Xiong, Hui, Park, Haesun, Janardan, Ravi, and Kumar, Vipin
Subjects: Electronic data processing, Data mining, Algorithms, Data warehousing/data mining, Algorithm, Business, Computers, Electronics, Electronics and electrical industries
Abstract: Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the literature, a well-known dimension reduction algorithm is Linear Discriminant Analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of Singular Value Decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental dimension reduction algorithm, called IDR/QR, which applies QR Decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically. Index Terms--Dimension reduction, linear discriminant analysis, incremental learning, QR Decomposition, Singular Value Decomposition (SVD).
Published: 2005

3. MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization.

Author: Kannan, Ramakrishnan, Ballard, Grey, and Park, Haesun
Subjects: MATRICES (Mathematics), NUMERICAL analysis, MATHEMATICAL analysis, DATA mining, ALGORITHMS
Abstract: Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors \mathbfW and \mathbfH , for the given input matrix \mathbfA , such that \mathbfA\approx \mathbfW\mathbfH . NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms that iteratively solves alternating non-negative least squares (NLS) subproblems for \mathbfW and \mathbfH . It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

4. iVisClustering: An Interactive Visual Document Clustering via Topic Modeling.

Author: Lee, Hanseung, Kihm, Jaeyeon, Choo, Jaegul, Stasko, John, and Park, Haesun
Subjects: INFORMATION storage & retrieval systems, HUMAN-machine systems, DATABASE management, DATA mining, CLUSTER analysis (Statistics), DATA analysis, DATA visualization
Abstract: Clustering plays an important role in many large-scale data analyses providing users with an overall understanding of their data. Nonetheless, clustering is not an easy task due to noisy features and outliers existing in the data, and thus the clustering results obtained from automatic algorithms often do not make clear sense. To remedy this problem, automatic clustering should be complemented with interactive visualization strategies. This paper proposes an interactive visual analytics system for document clustering, called iVisClustering, based on a widely-used topic modeling method, latent Dirichlet allocation (LDA). iVisClustering provides a summary of each cluster in terms of its most representative keywords and visualizes soft clustering results in parallel coordinates. The main view of the system provides a 2D plot that visualizes cluster similarities and the relation among data items with a graph-based representation. iVisClustering provides several other views, which contain useful interaction methods. With help of these visualization modules, we can interactively refine the clustering results in various ways. Keywords can be adjusted so that they characterize each cluster better. In addition, our system can filter out noisy data and re-cluster the data accordingly. Cluster hierarchy can be constructed using a tree structure and for this purpose, the system supports cluster-level interactions such as sub-clustering, removing unimportant clusters, merging the clusters that have similar meanings, and moving certain clusters to any other node in the tree structure. Furthermore, the system provides document-level interactions such as moving mis-clustered documents to another cluster and removing useless documents. Finally, we present how interactive clustering is performed via iVisClustering by using real-world document data sets. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

5. Feature Reduction via Generalized Uncorrelated Linear Discriminant Analysis.

Author: Jieping Ye, Janardan, Ravi, Qi Li, and Park, Haesun
Subjects: INFORMATION resources management, DATA mining, MACHINE learning, BIOINFORMATICS, STATISTICAL correlation, SEARCH engines, ONLINE data processing, ARTIFICIAL intelligence, INFORMATION science
Abstract: High-dimensional data appear in many applications of data mining, machine learning, and bioinformatics. Feature reduction is commonly applied as a preprocessing step to overcome the curse of dimensionality. Uncorrelated Linear Discriminant Analysis (ULDA) was recently proposed for feature reduction. The extracted features via ULDA were shown to be statistically uncorrelated, which is desirable for many applications. In this paper, an algorithm called ULDA/QR is proposed to simplify the previous implementation of ULDA. Then, the ULDA/GSVD algorithm is proposed, based on a novel optimization criterion, to address the singularity problem which occurs in undersampled problems, where the data dimension is larger than the sample size. The criterion used is the regularized version of the one in ULDA/QR. Surprisingly, our theoretical result shows that the solution to ULDA/GSVD is independent of the value of the regularization parameter. Experimental results on various types of data sets are reported to show the effectiveness of the proposed algorithm and to compare it with other commonly used feature reduction algorithms. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

6. IDR/QR: An Incremental Dimension Reduction Algorithm via QR Decomposition.

Author: Jieping Ye, Qi Li, Hui Xiong, Park, Haesun, Janardan, Ravi, and Kumar, Vipin
Subjects: ELECTRONIC data processing, DATA mining, DATABASE searching, ALGORITHMS, EIGENVALUES, KNOWLEDGE management
Abstract: Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the literature, a well-known dimension reduction algorithm is Linear Discriminant Analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of Singular Value Decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental dimension reduction algorithm, called IDR/QR, which applies OR Decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the DR/OR algorithm can constrain the computational cost by applying efficient OR-updating techniques. Finally, we evaluate the effectiveness of the IDA/OR algorithm in terms of classification error rate on the reduced dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically. [ABSTRACT FROM AUTHOR]
Published: 2005
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Park, Haesun"'

1. Integer Matrix Approximation and Data Mining

2. IDR/QR: an incremental dimension reduction algorithm via QR decomposition

3. MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization.

4. iVisClustering: An Interactive Visual Document Clustering via Topic Modeling.

5. Feature Reduction via Generalized Uncorrelated Linear Discriminant Analysis.

6. IDR/QR: An Incremental Dimension Reduction Algorithm via QR Decomposition.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

6 results on '"Park, Haesun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources