Author: "Wu, Chenwei" / Topic: fos: computer and information sciences - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wu, Chenwei"' showing total 8 results

Start Over Author "Wu, Chenwei" Topic fos: computer and information sciences

8 results on '"Wu, Chenwei"'

1. Hiding Data Helps: On the Benefits of Masking for Sparse Coding

Author: Chidambaram, Muthu, Wu, Chenwei, Cheng, Yu, and Ge, Rong
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: Sparse coding, which refers to modeling a signal as sparse linear combinations of the elements of a learned dictionary, has proven to be a successful (and interpretable) approach in applications such as signal processing, computer vision, and medical imaging. While this success has spurred much work on provable guarantees for dictionary recovery when the learned dictionary is the same size as the ground-truth dictionary, work on the setting where the learned dictionary is larger (or over-realized) with respect to the ground truth is comparatively nascent. Existing theoretical results in this setting have been constrained to the case of noise-less data. We show in this work that, in the presence of noise, minimizing the standard dictionary learning objective can fail to recover the elements of the ground-truth dictionary in the over-realized regime, regardless of the magnitude of the signal in the data-generating process. Furthermore, drawing from the growing body of work on self-supervised learning, we propose a novel masking objective for which recovering the ground-truth dictionary is in fact optimal as the signal increases for a large class of data-generating processes. We corroborate our theoretical results with experiments across several parameter regimes showing that our proposed objective also enjoys better empirical performance than the standard reconstruction objective., Comment: 16 pages, 1 figure, ICML 2023
Published: 2023
Full Text: View/download PDF

2. BenchMD: A Benchmark for Unified Learning on Medical Images and Sensors

Author: Wantlin, Kathryn, Wu, Chenwei, Huang, Shih-Cheng, Banerjee, Oishi, Dadabhoy, Farah, Mehta, Veeral Vipin, Han, Ryan Wonhee, Cao, Fang, Narayan, Raja R., Colak, Errol, Adamson, Adewole, Heacock, Laura, Tison, Geoffrey H., Tamkin, Alex, and Rajpurkar, Pranav
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Medical data poses a daunting challenge for AI algorithms: it exists in many different modalities, experiences frequent distribution shifts, and suffers from a scarcity of examples and labels. Recent advances, including transformers and self-supervised learning, promise a more universal approach that can be applied flexibly across these diverse conditions. To measure and drive progress in this direction, we present BenchMD: a benchmark that tests how well unified, modality-agnostic methods, including architectures and training techniques (e.g. self-supervised learning, ImageNet pretraining),perform on a diverse array of clinically-relevant medical tasks. BenchMD combines 19 publicly available datasets for 7 medical modalities, including 1D sensor data, 2D images, and 3D volumetric scans. Our benchmark reflects real-world data constraints by evaluating methods across a range of dataset sizes, including challenging few-shot settings that incentivize the use of pretraining. Finally, we evaluate performance on out-of-distribution data collected at different hospitals than the training data, representing naturally-occurring distribution shifts that frequently degrade the performance of medical AI models. Our baseline results demonstrate that no unified learning technique achieves strong performance across all modalities, leaving ample room for improvement on the benchmark. Code is released at https://github.com/rajpurkarlab/BenchMD.
Published: 2023
Full Text: View/download PDF

3. Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

Author: Chidambaram, Muthu, Wang, Xiang, Wu, Chenwei, and Ge, Rong
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Optimization and Control (math.OC), Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels. In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain some of this success from a feature learning perspective. We focus our attention on classification problems in which each class may have multiple associated features (or views) that can be used to predict the class correctly. Our main theoretical results demonstrate that, for a non-trivial class of data distributions with two features per class, training a 2-layer convolutional network using empirical risk minimization can lead to learning only one feature for almost all classes while training with a specific instantiation of Mixup succeeds in learning both features for every class. We also show empirically that these theoretical insights extend to the practical settings of image benchmarks modified to have multiple features., Comment: 37 pages, 2 figures, ICML 2023
Published: 2022
Full Text: View/download PDF

4. Towards Understanding the Data Dependency of Mixup-style Training

Author: Chidambaram, Muthu, Wang, Xiang, Hu, Yuzheng, Wu, Chenwei, and Ge, Rong
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training. In this paper, we investigate how these benefits of Mixup training rely on properties of the data in the context of classification. For minimizing the original empirical risk, we compute a closed form for the Mixup-optimal classification, which allows us to construct a simple dataset on which minimizing the Mixup loss can provably lead to learning a classifier that does not minimize the empirical loss on the data. On the other hand, we also give sufficient conditions for Mixup training to also minimize the original empirical risk. For generalization, we characterize the margin of a Mixup classifier, and use this to understand why the decision boundary of a Mixup classifier can adapt better to the full structure of the training data when compared to standard training. In contrast, we also show that, for a large class of linear models and linearly separable datasets, Mixup training leads to learning the same classifier as standard training., Comment: 26 pages, 14 figures, Accepted to ICLR 2022 (Spotlight)
Published: 2021
Full Text: View/download PDF

5. Beyond Lazy Training for Over-parameterized Tensor Decomposition

Author: Wang, Xiang, Wu, Chenwei, Lee, Jason D., Ma, Tengyu, and Ge, Rong
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an $l$-th order tensor in $(R^d)^{\otimes l}$ of rank $r$ (where $r\ll d$), can variants of gradient descent find a rank $m$ decomposition where $m > r$? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2.5l}\log d)$. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data., Comment: NeurIPS 2020; the first two authors contribute equally
Published: 2020

6. Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

Author: Wu, Yikai, Zhu, Xingyu, Wu, Chenwei, Wang, Annie, and Ge, Rong
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, I.2.6, Computer Science - Neural and Evolutionary Computing, Machine Learning (stat.ML), Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Abstract: Hessian captures important properties of the deep neural network loss landscape. Previous works have observed low rank structure in the Hessians of neural networks. In this paper, we propose a decoupling conjecture that decomposes the layer-wise Hessians of a network as the Kronecker product of two smaller matrices. We can analyze the properties of these smaller matrices and prove the structure of top eigenspace random 2-layer networks. The decoupling conjecture has several other interesting implications - top eigenspaces for different models have surprisingly high overlap, and top eigenvectors form low rank matrices when they are reshaped into the same shape as the corresponding weight matrix. All of these can be verified empirically for deeper networks. Finally, we use the structure of layer-wise Hessian to get better explicit generalization bounds for neural networks., 72 pages, 31 figures. Main text: 10 pages, 7 figures. First two authors have equal contribution and are in alphabetical order
Published: 2020

7. Secure Data Sharing With Flow Model

Author: Wu, Chenwei, Du, Chenzhuang, and Yuan, Yang
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Statistics - Machine Learning, Machine Learning (stat.ML), Cryptography and Security (cs.CR), Machine Learning (cs.LG)
Abstract: In the classical multi-party computation setting, multiple parties jointly compute a function without revealing their own input data. We consider a variant of this problem, where the input data can be shared for machine learning training purposes, but the data are also encrypted so that they cannot be recovered by other parties. We present a rotation based method using flow model, and theoretically justified its security. We demonstrate the effectiveness of our method in different scenarios, including supervised secure model training, and unsupervised generative model training. Our code is available at https://github.com/ duchenzhuang/flowencrypt.
Published: 2020
Full Text: View/download PDF

8. Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Author: Wang, Xiang, Yuan, Shuai, Wu, Chenwei, and Ge, Rong
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Choosing the right parameters for optimization algorithms is often the key to their success in practice. Solving this problem using a learning-to-learn approach -- using meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates -- was recently shown to be effective. However, the meta-optimization problem is difficult. In particular, the meta-gradient can often explode/vanish, and the learned optimizer may not have good generalization performance if the meta-objective is not chosen carefully. In this paper we give meta-optimization guarantees for the learning-to-learn approach on a simple problem of tuning the step size for quadratic loss. Our results show that the na\"ive objective suffers from meta-gradient explosion/vanishing problem. Although there is a way to design the meta-objective so that the meta-gradient remains polynomially bounded, computing the meta-gradient directly using backpropagation leads to numerical issues. We also characterize when it is necessary to compute the meta-objective on a separate validation set to ensure the generalization performance of the learned optimizer. Finally, we verify our results empirically and show that a similar phenomenon appears even for more complicated learned optimizers parametrized by neural networks., Comment: ICML 2021. Added proof sketch
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Wu, Chenwei"'

1. Hiding Data Helps: On the Benefits of Masking for Sparse Coding

2. BenchMD: A Benchmark for Unified Learning on Medical Images and Sensors

3. Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

4. Towards Understanding the Data Dependency of Mixup-style Training

5. Beyond Lazy Training for Over-parameterized Tensor Decomposition

6. Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

7. Secure Data Sharing With Flow Model

8. Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

8 results on '"Wu, Chenwei"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources