Author: "Zhou, Kaiwen" / Topic: computer science - machine learning - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhou, Kaiwen"' showing total 22 results

Start Over Author "Zhou, Kaiwen" Topic computer science - machine learning

22 results on '"Zhou, Kaiwen"'

1. Enhancing Neural Subset Selection: Integrating Background Information into Set Representations

Author: Xie, Binghui, Bian, Yatao, zhou, Kaiwen, Chen, Yongqiang, Zhao, Peilin, Han, Bo, Meng, Wei, and Cheng, James
Subjects: Computer Science - Machine Learning
Abstract: Learning neural subset selection tasks, such as compound selection in AI-aided drug discovery, have become increasingly pivotal across diverse applications. The existing methodologies in the field primarily concentrate on constructing models that capture the relationship between utility function values and subsets within their respective supersets. However, these approaches tend to overlook the valuable information contained within the superset when utilizing neural networks to model set functions. In this work, we address this oversight by adopting a probabilistic perspective. Our theoretical findings demonstrate that when the target value is conditioned on both the input set and subset, it is essential to incorporate an \textit{invariant sufficient statistic} of the superset into the subset of interest for effective learning. This ensures that the output value remains invariant to permutations of the subset and its corresponding superset, enabling identification of the specific superset from which the subset originated. Motivated by these insights, we propose a simple yet effective information aggregation module designed to merge the representations of subsets and supersets from a permutation invariance perspective. Comprehensive empirical evaluations across diverse tasks and datasets validate the enhanced efficacy of our approach over conventional methods, underscoring the practicality and potency of our proposed strategies in real-world contexts.
Published: 2024

2. Does Invariant Graph Learning via Environment Augmentation Learn Invariance?

Author: Chen, Yongqiang, Bian, Yatao, Zhou, Kaiwen, Xie, Binghui, Han, Bo, and Cheng, James
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Invariant graph representation learning aims to learn the invariance among data from different environments for out-of-distribution generalization on graphs. As the graph environment partitions are usually expensive to obtain, augmenting the environment information has become the de facto approach. However, the usefulness of the augmented environment information has never been verified. In this work, we find that it is fundamentally impossible to learn invariant graph representations via environment augmentation without additional assumptions. Therefore, we develop a set of minimal assumptions, including variation sufficiency and variation consistency, for feasible invariant graph learning. We then propose a new framework Graph invAriant Learning Assistant (GALA). GALA incorporates an assistant model that needs to be sensitive to graph environment changes or distribution shifts. The correctness of the proxy predictions by the assistant model hence can differentiate the variations in spurious subgraphs. We show that extracting the maximally invariant subgraph to the proxy predictions provably identifies the underlying invariant subgraph for successful OOD generalization under the established minimal assumptions. Extensive experiments on datasets including DrugOOD with various graph distribution shifts confirm the effectiveness of GALA., Comment: NeurIPS 2023, 34 pages, 35 figures
Published: 2023

3. Understanding and Improving Feature Learning for Out-of-Distribution Generalization

Author: Chen, Yongqiang, Huang, Wei, Zhou, Kaiwen, Bian, Yatao, Han, Bo, and Cheng, James
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: A common explanation for the failure of out-of-distribution (OOD) generalization is that the model trained with empirical risk minimization (ERM) learns spurious features instead of invariant features. However, several recent studies challenged this explanation and found that deep networks may have already learned sufficiently good features for OOD generalization. Despite the contradictions at first glance, we theoretically show that ERM essentially learns both spurious and invariant features, while ERM tends to learn spurious features faster if the spurious correlation is stronger. Moreover, when fed the ERM learned features to the OOD objectives, the invariant feature learning quality significantly affects the final OOD performance, as OOD objectives rarely learn new features. Therefore, ERM feature learning can be a bottleneck to OOD generalization. To alleviate the reliance, we propose Feature Augmented Training (FeAT), to enforce the model to learn richer features ready for OOD generalization. FeAT iteratively augments the model to learn new features while retaining the already learned features. In each round, the retention and augmentation operations are performed on different subsets of the training data that capture distinct features. Extensive experiments show that FeAT effectively learns richer features thus boosting the performance of various OOD objectives., Comment: Yongqiang Chen, Wei Huang, and Kaiwen Zhou contributed equally; NeurIPS 2023, 55 pages, 64 figures
Published: 2023

4. ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

Author: Zhou, Kaiwen, Zheng, Kaizhi, Pryor, Connor, Shen, Yilin, Jin, Hongxia, Getoor, Lise, and Wang, Xin Eric
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation actions by modeling it as soft logic predicates for efficient exploration. Extensive experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 288% relative Success Rate improvement than CoW on MP3D).
Published: 2023

5. Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping

Author: Jin, Chenhan, Zhou, Kaiwen, Han, Bo, Cheng, James, and Zeng, Tieyong
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: We consider stochastic convex optimization for heavy-tailed data with the guarantee of being differentially private (DP). Most prior works on differentially private stochastic convex optimization for heavy-tailed data are either restricted to gradient descent (GD) or performed multi-times clipping on stochastic gradient descent (SGD), which is inefficient for large-scale problems. In this paper, we consider a one-time clipping strategy and provide principled analyses of its bias and private mean estimation. We establish new convergence results and improved complexity bounds for the proposed algorithm called AClipped-dpSGD for constrained and unconstrained convex problems. We also extend our convergent analysis to the strongly convex case and non-smooth case (which works for generalized smooth objectives with H$\ddot{\text{o}}$lder-continuous gradients). All the above results are guaranteed with a high probability for heavy-tailed data. Numerical experiments are conducted to justify the theoretical improvement.
Published: 2022

6. Pareto Invariant Risk Minimization: Towards Mitigating the Optimization Dilemma in Out-of-Distribution Generalization

Author: Chen, Yongqiang, Zhou, Kaiwen, Bian, Yatao, Xie, Binghui, Wu, Bingzhe, Zhang, Yonggang, Ma, Kaili, Yang, Han, Zhao, Peilin, Han, Bo, and Cheng, James
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Recently, there has been a growing surge of interest in enabling machine learning systems to generalize well to Out-of-Distribution (OOD) data. Most efforts are devoted to advancing optimization objectives that regularize models to capture the underlying invariance; however, there often are compromises in the optimization process of these OOD objectives: i) Many OOD objectives have to be relaxed as penalty terms of Empirical Risk Minimization (ERM) for the ease of optimization, while the relaxed forms can weaken the robustness of the original objective; ii) The penalty terms also require careful tuning of the penalty weights due to the intrinsic conflicts between ERM and OOD objectives. Consequently, these compromises could easily lead to suboptimal performance of either the ERM or OOD objective. To address these issues, we introduce a multi-objective optimization (MOO) perspective to understand the OOD optimization process, and propose a new optimization scheme called PAreto Invariant Risk Minimization (PAIR). PAIR improves the robustness of OOD objectives by cooperatively optimizing with other OOD objectives, thereby bridging the gaps caused by the relaxations. Then PAIR approaches a Pareto optimal solution that trades off the ERM and OOD objectives properly. Extensive experiments on challenging benchmarks, WILDS, show that PAIR alleviates the compromises and yields top OOD performances., Comment: ICLR 2023, 50 pages, 58 figures
Published: 2022

7. Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack

Author: Gao, Ruize, Wang, Jiongxiao, Zhou, Kaiwen, Liu, Feng, Xie, Binghui, Niu, Gang, Han, Bo, and Cheng, James
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: The AutoAttack (AA) has been the most reliable method to evaluate adversarial robustness when considerable computational resources are available. However, the high computational cost (e.g., 100 times more than that of the project gradient descent attack) makes AA infeasible for practitioners with limited computational resources, and also hinders applications of AA in the adversarial training (AT). In this paper, we propose a novel method, minimum-margin (MM) attack, to fast and reliably evaluate adversarial robustness. Compared with AA, our method achieves comparable performance but only costs 3% of the computational time in extensive experiments. The reliability of our method lies in that we evaluate the quality of adversarial examples using the margin between two targets that can precisely identify the most adversarial example. The computational efficiency of our method lies in an effective Sequential TArget Ranking Selection (STARS) method, ensuring that the cost of the MM attack is independent of the number of classes. The MM attack opens a new way for evaluating adversarial robustness and provides a feasible and reliable way to generate high-quality adversarial examples in AT.
Published: 2022

8. An Adaptive Incremental Gradient Method With Support for Non-Euclidean Norms

Author: Xie, Binghui, Jin, Chenhan, Zhou, Kaiwen, Cheng, James, and Meng, Wei
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: Stochastic variance reduced methods have shown strong performance in solving finite-sum problems. However, these methods usually require the users to manually tune the step-size, which is time-consuming or even infeasible for some large-scale optimization tasks. To overcome the problem, we propose and analyze several novel adaptive variants of the popular SAGA algorithm. Eventually, we design a variant of Barzilai-Borwein step-size which is tailored for the incremental gradient method to ensure memory efficiency and fast convergence. We establish its convergence guarantees under general settings that allow non-Euclidean norms in the definition of smoothness and the composite objectives, which cover a broad range of applications in machine learning. We improve the analysis of SAGA to support non-Euclidean norms, which fills the void of existing work. Numerical experiments on standard datasets demonstrate a competitive performance of the proposed algorithm compared with existing variance-reduced methods and their adaptive variants.
Published: 2022

9. FedVLN: Privacy-preserving Federated Vision-and-Language Navigation

Author: Zhou, Kaiwen and Wang, Xin Eric
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Data privacy is a central problem for embodied agents that can perceive the environment, communicate with humans, and act in the real world. While helping humans complete tasks, the agent may observe and process sensitive information of users, such as house environments, human activities, etc. In this work, we introduce privacy-preserving embodied agent learning for the task of Vision-and-Language Navigation (VLN), where an embodied agent navigates house environments by following natural language instructions. We view each house environment as a local client, which shares nothing other than local updates with the cloud server and other clients, and propose a novel federated vision-and-language navigation (FedVLN) framework to protect data privacy during both training and pre-exploration. Particularly, we propose a decentralized training strategy to limit the data of each client to its local model training and a federated pre-exploration method to do partial model aggregation to improve model generalizability to unseen environments. Extensive results on R2R and RxR datasets show that under our FedVLN framework, decentralized VLN models achieve comparable results with centralized training while protecting seen environment privacy, and federated pre-exploration significantly outperforms centralized pre-exploration while preserving unseen environment privacy., Comment: Accepted by ECCV 2022
Published: 2022

10. Accelerating Perturbed Stochastic Iterates in Asynchronous Lock-Free Optimization

Author: Zhou, Kaiwen, So, Anthony Man-Cho, and Cheng, James
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We show that stochastic acceleration can be achieved under the perturbed iterate framework (Mania et al., 2017) in asynchronous lock-free optimization, which leads to the optimal incremental gradient complexity for finite-sum objectives. We prove that our new accelerated method requires the same linear speed-up condition as the existing non-accelerated methods. Our core algorithmic discovery is a new accelerated SVRG variant with sparse updates. Empirical results are presented to verify our theoretical findings., Comment: 21 pages, 22 figures
Published: 2021

11. Local Reweighting for Adversarial Training

Author: Gao, Ruize, Liu, Feng, Zhou, Kaiwen, Niu, Gang, Han, Bo, and Cheng, James
Subjects: Computer Science - Machine Learning
Abstract: Instances-reweighted adversarial training (IRAT) can significantly boost the robustness of trained models, where data being less/more vulnerable to the given attack are assigned smaller/larger weights during training. However, when tested on attacks different from the given attack simulated in training, the robustness may drop significantly (e.g., even worse than no reweighting). In this paper, we study this problem and propose our solution--locally reweighted adversarial training (LRAT). The rationale behind IRAT is that we do not need to pay much attention to an instance that is already safe under the attack. We argue that the safeness should be attack-dependent, so that for the same instance, its weight can change given different attacks based on the same model. Thus, if the attack simulated in training is mis-specified, the weights of IRAT are misleading. To this end, LRAT pairs each instance with its adversarial variants and performs local reweighting inside each pair, while performing no global reweighting--the rationale is to fit the instance itself if it is immune to the attack, but not to skip the pair, in order to passively defend different attacks in future. Experiments show that LRAT works better than both IRAT (i.e., global reweighting) and the standard AT (i.e., no reweighting) when trained with an attack and tested on different attacks.
Published: 2021

12. Practical Schemes for Finding Near-Stationary Points of Convex Finite-Sums

Author: Zhou, Kaiwen, Tian, Lai, So, Anthony Man-Cho, and Cheng, James
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In convex optimization, the problem of finding near-stationary points has not been adequately studied yet, unlike other optimality measures such as the function value. Even in the deterministic case, the optimal method (OGM-G, due to Kim and Fessler (2021)) has just been discovered recently. In this work, we conduct a systematic study of algorithmic techniques for finding near-stationary points of convex finite-sums. Our main contributions are several algorithmic discoveries: (1) we discover a memory-saving variant of OGM-G based on the performance estimation problem approach (Drori and Teboulle, 2014); (2) we design a new accelerated SVRG variant that can simultaneously achieve fast rates for minimizing both the gradient norm and function value; (3) we propose an adaptively regularized accelerated SVRG variant, which does not require the knowledge of some unknown initial constants and achieves near-optimal complexities. We put an emphasis on the simplicity and practicality of the new schemes, which could facilitate future work., Comment: 29 pages, 4 figures
Published: 2021

13. Boosting First-Order Methods by Shifting Objective: New Schemes with Faster Worst-Case Rates

Author: Zhou, Kaiwen, So, Anthony Man-Cho, and Cheng, James
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We propose a new methodology to design first-order methods for unconstrained strongly convex problems. Specifically, instead of tackling the original objective directly, we construct a shifted objective function that has the same minimizer as the original objective and encodes both the smoothness and strong convexity of the original objective in an interpolation condition. We then propose an algorithmic template for tackling the shifted objective, which can exploit such a condition. Following this template, we derive several new accelerated schemes for problems that are equipped with various first-order oracles and show that the interpolation condition allows us to vastly simplify and tighten the analysis of the derived methods. In particular, all the derived methods have faster worst-case convergence rates than their existing counterparts. Experiments on machine learning tasks are conducted to evaluate the new methods., Comment: NeurIPS 2020, 29 pages, 7 figures
Published: 2020

14. Convolutional Embedding for Edit Distance

Author: Dai, Xinyan, Yan, Xiao, Zhou, Kaiwen, Wang, Yuxuan, Yang, Han, and Cheng, James
Subjects: Computer Science - Databases, Computer Science - Machine Learning
Abstract: Edit-distance-based string similarity search has many applications such as spell correction, data de-duplication, and sequence alignment. However, computing edit distance is known to have high complexity, which makes string similarity search challenging for large datasets. In this paper, we propose a deep learning pipeline (called CNN-ED) that embeds edit distance into Euclidean distance for fast approximate similarity search. A convolutional neural network (CNN) is used to generate fixed-length vector embeddings for a dataset of strings and the loss function is a combination of the triplet loss and the approximation error. To justify our choice of using CNN instead of other structures (e.g., RNN) as the model, theoretical analysis is conducted to show that some basic operations in our CNN model preserve edit distance. Experimental results show that CNN-ED outperforms data-independent CGK embedding and RNN-based GRU embedding in terms of both accuracy and efficiency by a large margin. We also show that string similarity search can be significantly accelerated using CNN-based embeddings, sometimes by orders of magnitude., Comment: Accepted by the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020
Published: 2020
Full Text: View/download PDF

15. Hyper-Sphere Quantization: Communication-Efficient SGD for Federated Learning

Author: Dai, Xinyan, Yan, Xiao, Zhou, Kaiwen, Yang, Han, Ng, Kelvin K. W., Cheng, James, and Fan, Yu
Subjects: Computer Science - Machine Learning, Computer Science - Information Retrieval, Statistics - Machine Learning
Abstract: The high cost of communicating gradients is a major bottleneck for federated learning, as the bandwidth of the participating user devices is limited. Existing gradient compression algorithms are mainly designed for data centers with high-speed network and achieve $O(\sqrt{d} \log d)$ per-iteration communication cost at best, where $d$ is the size of the model. We propose hyper-sphere quantization (HSQ), a general framework that can be configured to achieve a continuum of trade-offs between communication efficiency and gradient accuracy. In particular, at the high compression ratio end, HSQ provides a low per-iteration communication cost of $O(\log d)$, which is favorable for federated learning. We prove the convergence of HSQ theoretically and show by experiments that HSQ significantly reduces the communication cost of model training without hurting convergence accuracy.
Published: 2019

16. Norm-Range Partition: A Universal Catalyst for LSH based Maximum Inner Product Search (MIPS)

Author: Yan, Xiao, Dai, Xinyan, Liu, Jie, Zhou, Kaiwen, and Cheng, James
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Recently, locality sensitive hashing (LSH) was shown to be effective for MIPS and several algorithms including $L_2$-ALSH, Sign-ALSH and Simple-LSH have been proposed. In this paper, we introduce the norm-range partition technique, which partitions the original dataset into sub-datasets containing items with similar 2-norms and builds hash index independently for each sub-dataset. We prove that norm-range partition reduces the query processing complexity for all existing LSH based MIPS algorithms under mild conditions. The key to performance improvement is that norm-range partition allows to use smaller normalization factor most sub-datasets. For efficient query processing, we also formulate a unified framework to rank the buckets from the hash indexes of different sub-datasets. Experiments on real datasets show that norm-range partition significantly reduces the number of probed for LSH based MIPS algorithms when achieving the same recall.
Published: 2018

17. ASVRG: Accelerated Proximal SVRG

Author: Shang, Fanhua, Jiao, Licheng, Zhou, Kaiwen, Cheng, James, Ren, Yan, and Jin, Yufei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: This paper proposes an accelerated proximal stochastic variance reduced gradient (ASVRG) method, in which we design a simple and effective momentum acceleration trick. Unlike most existing accelerated stochastic variance reduction methods such as Katyusha, ASVRG has only one additional variable and one momentum parameter. Thus, ASVRG is much simpler than those methods, and has much lower per-iteration complexity. We prove that ASVRG achieves the best known oracle complexities for both strongly convex and non-strongly convex objectives. In addition, we extend ASVRG to mini-batch and non-smooth settings. We also empirically verify our theoretical results and show that the performance of ASVRG is comparable with, and sometimes even better than that of the state-of-the-art stochastic methods., Comment: 32 pages, 3 figures
Published: 2018

18. A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates

Author: Zhou, Kaiwen, Shang, Fanhua, and Cheng, James
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Recent years have witnessed exciting progress in the study of stochastic variance reduced gradient methods (e.g., SVRG, SAGA), their accelerated variants (e.g, Katyusha) and their extensions in many different settings (e.g., online, sparse, asynchronous, distributed). Among them, accelerated methods enjoy improved convergence rates but have complex coupling structures, which makes them hard to be extended to more settings (e.g., sparse and asynchronous) due to the existence of perturbation. In this paper, we introduce a simple stochastic variance reduced algorithm (MiG), which enjoys the best-known convergence rates for both strongly convex and non-strongly convex problems. Moreover, we also present its efficient sparse and asynchronous variants, and theoretically analyze its convergence rates in these settings. Finally, extensive experiments for various machine learning problems such as logistic regression are given to illustrate the practical improvement in both serial and asynchronous settings., Comment: ICML2018
Published: 2018

19. Direct Acceleration of SAGA using Sampled Negative Momentum

Author: Zhou, Kaiwen
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Variance reduction is a simple and effective technique that accelerates convex (or non-convex) stochastic optimization. Among existing variance reduction methods, SVRG and SAGA adopt unbiased gradient estimators and are the most popular variance reduction methods in recent years. Although various accelerated variants of SVRG (e.g., Katyusha and Acc-Prox-SVRG) have been proposed, the direct acceleration of SAGA still remains unknown. In this paper, we propose a directly accelerated variant of SAGA using a novel Sampled Negative Momentum (SSNM), which achieves the best known oracle complexity for strongly convex problems (with known strong convexity parameter). Consequently, our work fills the void of directly accelerated SAGA., Comment: 17 pages, 6 figures
Published: 2018

20. VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

Author: Shang, Fanhua, Zhou, Kaiwen, Liu, Hongying, Cheng, James, Tsang, Ivor W., Zhang, Lijun, Tao, Dacheng, and Jiao, Licheng
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors of VR-SGD are set to the average and last iterate of the previous epoch, respectively. The settings allow us to use much larger learning rates, and also make our convergence analysis more challenging. We also design two different update rules for smooth and non-smooth objective functions, respectively, which means that VR-SGD can tackle non-smooth and/or non-strongly convex problems directly without any reduction techniques. Moreover, we analyze the convergence properties of VR-SGD for strongly convex problems, which show that VR-SGD attains linear convergence. Different from its counterparts that have no convergence guarantees for non-strongly convex problems, we also provide the convergence guarantees of VR-SGD for this case, and empirically verify that VR-SGD with varying learning rates achieves similar performance to its momentum accelerated variant that has the optimal convergence rate $\mathcal{O}(1/T^2)$. Finally, we apply VR-SGD to solve various machine learning problems, such as convex and non-convex empirical risk minimization, and leading eigenvalue computation. Experimental results show that VR-SGD converges significantly faster than SVRG and Prox-SVRG, and usually outperforms state-of-the-art accelerated methods, e.g., Katyusha., Comment: 46 pages, 25 figures. IEEE Transactions on Knowledge and Data Engineering, accepted in October, 2018. arXiv admin note: substantial text overlap with arXiv:1704.04966
Published: 2018

21. Towards Understanding Feature Learning in Out-of-Distribution Generalization

Author: Chen, Yongqiang, Huang, Wei, Zhou, Kaiwen, Bian, Yatao, Han, Bo, and Cheng, James
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: A common explanation for the failure of out-of-distribution (OOD) generalization is that the model trained with empirical risk minimization (ERM) learns spurious features instead of the desired invariant features. However, several recent studies challenged this explanation and found that deep networks may have already learned sufficiently good features for OOD generalization. The debate extends to the in-distribution and OOD performance correlations along with training or fine-tuning neural nets across a variety of OOD generalization tasks. To understand these seemingly contradicting phenomena, we conduct a theoretical investigation and find that ERM essentially learns both spurious features and invariant features. On the other hand, the quality of learned features during ERM pre-training significantly affects the final OOD performance, as OOD objectives rarely learn new features. Failing to capture all the underlying useful features during pre-training will further limit the final OOD performance. To remedy the issue, we propose Feature Augmented Training (FAT ), to enforce the model to learn all useful features by retaining the already learned features and augmenting new ones by multiple rounds. In each round, the retention and augmentation operations are performed on different subsets of the training data that capture distinct features. Extensive experiments show that FAT effectively learns richer features and consistently improves the OOD performance when applied to various objectives., Yongqiang Chen, Wei Huang, and Kaiwen Zhou contributed equally
Published: 2023

22. Efficient Private SCO for Heavy-Tailed Data via Clipping

Author: Jin, Chenhan, Zhou, Kaiwen, Han, Bo, Yang, Ming-Chang, and Cheng, James
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: We consider stochastic convex optimization for heavy-tailed data with the guarantee of being differentially private (DP). Prior work on this problem is restricted to the gradient descent (GD) method, which is inefficient for large-scale problems. In this paper, we resolve this issue and derive the first high-probability bounds for the private stochastic method with clipping. For general convex problems, we derive excess population risks $\Tilde{O}\left(\frac{d^{1/7}\sqrt{\ln\frac{(n \epsilon)^2}{\beta d}}}{(n\epsilon)^{2/7}}\right)$ and $\Tilde{O}\left(\frac{d^{1/7}\ln\frac{(n\epsilon)^2}{\beta d}}{(n\epsilon)^{2/7}}\right)$ under bounded or unbounded domain assumption, respectively (here $n$ is the sample size, $d$ is the dimension of the data, $\beta$ is the confidence level and $\epsilon$ is the private level). Then, we extend our analysis to the strongly convex case and non-smooth case (which works for generalized smooth objectives with H$\ddot{\text{o}}$lder-continuous gradients). We establish new excess risk bounds without bounded domain assumption. The results above achieve lower excess risks and gradient complexities than existing methods in their corresponding cases. Numerical experiments are conducted to justify the theoretical improvement.
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

22 results on '"Zhou, Kaiwen"'

1. Enhancing Neural Subset Selection: Integrating Background Information into Set Representations

2. Does Invariant Graph Learning via Environment Augmentation Learn Invariance?

3. Understanding and Improving Feature Learning for Out-of-Distribution Generalization

4. ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

5. Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping

6. Pareto Invariant Risk Minimization: Towards Mitigating the Optimization Dilemma in Out-of-Distribution Generalization

7. Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack

8. An Adaptive Incremental Gradient Method With Support for Non-Euclidean Norms

9. FedVLN: Privacy-preserving Federated Vision-and-Language Navigation

10. Accelerating Perturbed Stochastic Iterates in Asynchronous Lock-Free Optimization

11. Local Reweighting for Adversarial Training

12. Practical Schemes for Finding Near-Stationary Points of Convex Finite-Sums

13. Boosting First-Order Methods by Shifting Objective: New Schemes with Faster Worst-Case Rates

14. Convolutional Embedding for Edit Distance

15. Hyper-Sphere Quantization: Communication-Efficient SGD for Federated Learning

16. Norm-Range Partition: A Universal Catalyst for LSH based Maximum Inner Product Search (MIPS)

17. ASVRG: Accelerated Proximal SVRG

18. A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates

19. Direct Acceleration of SAGA using Sampled Negative Momentum

20. VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

21. Towards Understanding Feature Learning in Out-of-Distribution Generalization

22. Efficient Private SCO for Heavy-Tailed Data via Clipping

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

22 results on '"Zhou, Kaiwen"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources