Author: "Yu, Yaoliang" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yu, Yaoliang"' showing total 197 results

Start Over Author "Yu, Yaoliang"

197 results on '"Yu, Yaoliang"'

1. Uncoupled and Convergent Learning in Monotone Games under Bandit Feedback

Author: Dong, Jing, Wang, Baoxiang, and Yu, Yaoliang
Subjects: Computer Science - Computer Science and Game Theory
Abstract: We study the problem of no-regret learning algorithms for general monotone and smooth games and their last-iterate convergence properties. Specifically, we investigate the problem under bandit feedback and strongly uncoupled dynamics, which allows modular development of the multi-player system that applies to a wide range of real applications. We propose a mirror-descent-based algorithm, which converges in $O(T^{-1/4})$ and is also no-regret. The result is achieved by a dedicated use of two regularizations and the analysis of the fixed point thereof. The convergence rate is further improved to $O(T^{-1/2})$ in the case of strongly monotone games. Motivated by practical tasks where the game evolves over time, the algorithm is extended to time-varying monotone games. We provide the first non-asymptotic result in converging monotone games and give improved results for equilibrium tracking games.
Published: 2024

2. Alignment Calibration: Machine Unlearning for Contrastive Learning under Auditing

Author: Wang, Yihan, Lu, Yiwei, Zhang, Guojun, Boenisch, Franziska, Dziedzic, Adam, Yu, Yaoliang, and Gao, Xiao-Shan
Subjects: Computer Science - Machine Learning
Abstract: Machine unlearning provides viable solutions to revoke the effect of certain training data on pre-trained model parameters. Existing approaches provide unlearning recipes for classification and generative models. However, a category of important machine learning models, i.e., contrastive learning (CL) methods, is overlooked. In this paper, we fill this gap by first proposing the framework of Machine Unlearning for Contrastive learning (MUC) and adapting existing methods. Furthermore, we observe that several methods are mediocre unlearners and existing auditing tools may not be sufficient for data owners to validate the unlearning effects in contrastive learning. We thus propose a novel method called Alignment Calibration (AC) by explicitly considering the properties of contrastive learning and optimizing towards novel auditing metrics to easily verify unlearning. We empirically compare AC with baseline methods on SimCLR, MoCo and CLIP. We observe that AC addresses drawbacks of existing methods: (1) achieving state-of-the-art performance and approximating exact unlearning (retraining); (2) allowing data owners to clearly visualize the effect caused by unlearning through black-box auditing.
Published: 2024

3. Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning

Author: Malekmohammadi, Saber, Yu, Yaoliang, and Cao, Yang
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: High utility and rigorous data privacy are of the main goals of a federated learning (FL) system, which learns a model from the data distributed among some clients. The latter has been tried to achieve by using differential privacy in FL (DPFL). There is often heterogeneity in clients privacy requirements, and existing DPFL works either assume uniform privacy requirements for clients or are not applicable when server is not fully trusted (our setting). Furthermore, there is often heterogeneity in batch and/or dataset size of clients, which as shown, results in extra variation in the DP noise level across clients model updates. With these sources of heterogeneity, straightforward aggregation strategies, e.g., assigning clients aggregation weights proportional to their privacy parameters will lead to lower utility. We propose Robust-HDP, which efficiently estimates the true noise level in clients model updates and reduces the noise-level in the aggregated model updates considerably. Robust-HDP improves utility and convergence speed, while being safe to the clients that may maliciously send falsified privacy parameter to server. Extensive experimental results on multiple datasets and our theoretical analysis confirm the effectiveness of Robust-HDP. Our code can be found here., Comment: Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024
Published: 2024

4. Disguised Copyright Infringement of Latent Diffusion Models

Author: Lu, Yiwei, Yang, Matthew Y. R., Liu, Zuoqiu, Kamath, Gautam, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: Copyright infringement may occur when a generative model produces samples substantially similar to some copyrighted data that it had access to during the training phase. The notion of access usually refers to including copyrighted samples directly in the training dataset, which one may inspect to identify an infringement. We argue that such visual auditing largely overlooks a concealed copyright infringement, where one constructs a disguise that looks drastically different from the copyrighted sample yet still induces the effect of training Latent Diffusion Models on it. Such disguises only require indirect access to the copyrighted material and cannot be visually distinguished, thus easily circumventing the current auditing tools. In this paper, we provide a better understanding of such disguised copyright infringement by uncovering the disguises generation algorithm, the revelation of the disguises, and importantly, how to detect them to augment the existing toolbox. Additionally, we introduce a broader notion of acknowledgment for comprehending such indirect access. Our code is available at https://github.com/watml/disguised_copyright_infringement., Comment: Accepted to ICML 2024
Published: 2024

5. Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games

Author: Dong, Jing, Wang, Baoxiang, and Yu, Yaoliang
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning
Abstract: In this work, we study potential games and Markov potential games under stochastic cost and bandit feedback. We propose a variant of the Frank-Wolfe algorithm with sufficient exploration and recursive gradient estimation, which provably converges to the Nash equilibrium while attaining sublinear regret for each individual player. Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T^{4/5})$ for potential games, which matches the best available result, without using additional projection steps. Through carefully balancing the reuse of past samples and exploration of new samples, we then extend the results to Markov potential games and improve the best available Nash regret from $O(T^{5/6})$ to $O(T^{4/5})$. Moreover, our algorithm requires no knowledge of the game, such as the distribution mismatch coefficient, which provides more flexibility in its practical implementation. Experimental results corroborate our theoretical findings and underscore the practical effectiveness of our method.
Published: 2024

6. Structure Preserving Diffusion Models

Author: Lu, Haoye, Szabados, Spencer, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion models have become the leading distribution-learning method in recent years. Herein, we introduce structure-preserving diffusion processes, a family of diffusion processes for learning distributions that possess additional structure, such as group symmetries, by developing theoretical conditions under which the diffusion transition steps preserve said symmetry. While also enabling equivariant data sampling trajectories, we exemplify these results by developing a collection of different symmetry equivariant diffusion models capable of learning distributions that are inherently symmetric. Empirical studies, over both synthetic and real-world datasets, are used to validate the developed models adhere to the proposed theory and are capable of achieving improved performance over existing methods in terms of sample equality. We also show how the proposed models can be used to achieve theoretically guaranteed equivariant image noise reduction without prior knowledge of the image orientation.
Published: 2024

7. Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers

Author: Lu, Yiwei, Yu, Yaoliang, Li, Xinlin, and Nia, Vahid Partovi
Subjects: Computer Science - Machine Learning
Abstract: In neural network binarization, BinaryConnect (BC) and its variants are considered the standard. These methods apply the sign function in their forward pass and their respective gradients are backpropagated to update the weights. However, the derivative of the sign function is zero whenever defined, which consequently freezes training. Therefore, implementations of BC (e.g., BNN) usually replace the derivative of sign in the backward computation with identity or other approximate gradient alternatives. Although such practice works well empirically, it is largely a heuristic or ''training trick.'' We aim at shedding some light on these training tricks from the optimization perspective. Building from existing theory on ProxConnect (PC, a generalization of BC), we (1) equip PC with different forward-backward quantizers and obtain ProxConnect++ (PC++) that includes existing binarization techniques as special cases; (2) derive a principled way to synthesize forward-backward quantizers with automatic theoretical guarantees; (3) illustrate our theory by proposing an enhanced binarization algorithm BNN++; (4) conduct image classification experiments on CNNs and vision transformers, and empirically verify that BNN++ generally achieves competitive results on binarizing these models., Comment: Accepted to NeurIPS 2023
Published: 2024

8. Indiscriminate Data Poisoning Attacks on Pre-trained Feature Extractors

Author: Lu, Yiwei, Yang, Matthew Y. R., Kamath, Gautam, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: Machine learning models have achieved great success in supervised learning tasks for end-to-end training, which requires a large amount of labeled data that is not always feasible. Recently, many practitioners have shifted to self-supervised learning methods that utilize cheap unlabeled data to learn a general feature extractor via pre-training, which can be further applied to personalized downstream tasks by simply training an additional linear layer with limited labeled data. However, such a process may also raise concerns regarding data poisoning attacks. For instance, indiscriminate data poisoning attacks, which aim to decrease model utility by injecting a small number of poisoned data into the training set, pose a security risk to machine learning models, but have only been studied for end-to-end supervised learning. In this paper, we extend the exploration of the threat of indiscriminate attacks on downstream tasks that apply pre-trained feature extractors. Specifically, we propose two types of attacks: (1) the input space attacks, where we modify existing attacks to directly craft poisoned data in the input space. However, due to the difficulty of optimization under constraints, we further propose (2) the feature targeted attacks, where we mitigate the challenge with three stages, firstly acquiring target parameters for the linear head; secondly finding poisoned features by treating the learned feature representations as a dataset; and thirdly inverting the poisoned features back to the input space. Our experiments examine such attacks in popular downstream tasks of fine-tuning on the same dataset and transfer learning that considers domain adaptation. Empirical results reveal that transfer learning is more vulnerable to our attacks. Additionally, input space attacks are a strong threat if no countermeasures are posed, but are otherwise weaker than feature targeted attacks., Comment: Accepted to SaTML 2024
Published: 2024

9. $f$-MICL: Understanding and Generalizing InfoNCE-based Contrastive Learning

Author: Lu, Yiwei, Zhang, Guojun, Sun, Sun, Guo, Hongyu, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning
Abstract: In self-supervised contrastive learning, a widely-adopted objective function is InfoNCE, which uses the heuristic cosine similarity for the representation comparison, and is closely related to maximizing the Kullback-Leibler (KL)-based mutual information. In this paper, we aim at answering two intriguing questions: (1) Can we go beyond the KL-based objective? (2) Besides the popular cosine similarity, can we design a better similarity function? We provide answers to both questions by generalizing the KL-based mutual information to the $f$-Mutual Information in Contrastive Learning ($f$-MICL) using the $f$-divergences. To answer the first question, we provide a wide range of $f$-MICL objectives which share the nice properties of InfoNCE (e.g., alignment and uniformity), and meanwhile result in similar or even superior performance. For the second question, assuming that the joint feature distribution is proportional to the Gaussian kernel, we derive an $f$-Gaussian similarity with better interpretability and empirical performance. Finally, we identify close relationships between the $f$-MICL objective and several popular InfoNCE-based objectives. Using benchmark tasks from both vision and natural language, we empirically evaluate $f$-MICL with different $f$-divergences on various architectures (SimCLR, MoCo, and MoCo v3) and datasets. We observe that $f$-MICL generally outperforms the benchmarks and the best-performing $f$-divergence is task and dataset dependent., Comment: Accepted to TMLR in 2023
Published: 2024

10. Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks

Author: Lu, Yiwei, Kamath, Gautam, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: Indiscriminate data poisoning attacks aim to decrease a model's test accuracy by injecting a small amount of corrupted training data. Despite significant interest, existing attacks remain relatively ineffective against modern machine learning (ML) architectures. In this work, we introduce the notion of model poisoning reachability as a technical tool to explore the intrinsic limits of data poisoning attacks towards target parameters (i.e., model-targeted attacks). We derive an easily computable threshold to establish and quantify a surprising phase transition phenomenon among popular ML models: data poisoning attacks can achieve certain target parameters only when the poisoning ratio exceeds our threshold. Building on existing parameter corruption attacks and refining the Gradient Canceling attack, we perform extensive experiments to confirm our theoretical findings, test the predictability of our transition threshold, and significantly improve existing indiscriminate data poisoning baselines over a range of datasets and models. Our work highlights the critical role played by the poisoning ratio, and sheds new insights on existing empirical results, attacks and mitigation strategies in data poisoning., Comment: Accepted to ICML 2023
Published: 2023

11. DP$^2$-VAE: Differentially Private Pre-trained Variational Autoencoders

Author: Jiang, Dihong, Zhang, Guojun, Karami, Mahdi, Chen, Xi, Shao, Yunfeng, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: Modern machine learning systems achieve great success when trained on large datasets. However, these datasets usually contain sensitive information (e.g. medical records, face images), leading to serious privacy concerns. Differentially private generative models (DPGMs) emerge as a solution to circumvent such privacy concerns by generating privatized sensitive data. Similar to other differentially private (DP) learners, the major challenge for DPGM is also how to achieve a subtle balance between utility and privacy. We propose DP$^2$-VAE, a novel training mechanism for variational autoencoders (VAE) with provable DP guarantees and improved utility via \emph{pre-training on private data}. Under the same DP constraints, DP$^2$-VAE minimizes the perturbation noise during training, and hence improves utility. DP$^2$-VAE is very flexible and easily amenable to many other VAE variants. Theoretically, we study the effect of pretraining on private data. Empirically, we conduct extensive experiments on image datasets to illustrate our superiority over baselines under various privacy budgets and evaluation metrics., Comment: The privacy analysis in the first version is incorrect
Published: 2022

12. Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers

Author: Xin, Ji, Tang, Raphael, Jiang, Zhiying, Yu, Yaoliang, and Lin, Jimmy
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: There exists a wide variety of efficiency methods for natural language processing (NLP) tasks, such as pruning, distillation, dynamic inference, quantization, etc. We can consider an efficiency method as an operator applied on a model. Naturally, we may construct a pipeline of multiple efficiency methods, i.e., to apply multiple operators on the model sequentially. In this paper, we study the plausibility of this idea, and more importantly, the commutativity and cumulativeness of efficiency operators. We make two interesting observations: (1) Efficiency operators are commutative -- the order of efficiency methods within the pipeline has little impact on the final results; (2) Efficiency operators are also cumulative -- the final results of combining several efficiency methods can be estimated by combining the results of individual methods. These observations deepen our understanding of efficiency operators and provide useful guidelines for their real-world applications.
Published: 2022

13. Mitigating Data Heterogeneity in Federated Learning with Data Augmentation

Author: de Luca, Artur Back, Zhang, Guojun, Chen, Xi, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning
Abstract: Federated Learning (FL) is a prominent framework that enables training a centralized model while securing user privacy by fusing local, decentralized models. In this setting, one major obstacle is data heterogeneity, i.e., each client having non-identically and independently distributed (non-IID) data. This is analogous to the context of Domain Generalization (DG), where each client can be treated as a different domain. However, while many approaches in DG tackle data heterogeneity from the algorithmic perspective, recent evidence suggests that data augmentation can induce equal or greater performance. Motivated by this connection, we present federated versions of popular DG algorithms, and show that by applying appropriate data augmentation, we can mitigate data heterogeneity in the federated setting, and obtain higher accuracy on unseen clients. Equipped with data augmentation, we can achieve state-of-the-art performance using even the most basic Federated Averaging algorithm, with much sparser communication., Comment: 18 pages, 5 figures
Published: 2022

14. Towards Explanation for Unsupervised Graph-Level Representation Learning

Author: Zheng, Qinghua, Wang, Jihong, Luo, Minnan, Yu, Yaoliang, Li, Jundong, Yao, Lina, and Chang, Xiaojun
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Due to the superior performance of Graph Neural Networks (GNNs) in various domains, there is an increasing interest in the GNN explanation problem "\emph{which fraction of the input graph is the most crucial to decide the model's decision?}" Existing explanation methods focus on the supervised settings, \eg, node classification and graph classification, while the explanation for unsupervised graph-level representation learning is still unexplored. The opaqueness of the graph representations may lead to unexpected risks when deployed for high-stake decision-making scenarios. In this paper, we advance the Information Bottleneck principle (IB) to tackle the proposed explanation problem for unsupervised graph representations, which leads to a novel principle, \textit{Unsupervised Subgraph Information Bottleneck} (USIB). We also theoretically analyze the connection between graph representations and explanatory subgraphs on the label space, which reveals that the expressiveness and robustness of representations benefit the fidelity of explanatory subgraphs. Experimental results on both synthetic and real-world datasets demonstrate the superiority of our developed explainer and the validity of our theoretical analysis.
Published: 2022

15. Indiscriminate Data Poisoning Attacks on Neural Networks

Author: Lu, Yiwei, Kamath, Gautam, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: Data poisoning attacks, in which a malicious adversary aims to influence a model by injecting "poisoned" data into the training process, have attracted significant recent attention. In this work, we take a closer look at existing poisoning attacks and connect them with old and new algorithms for solving sequential Stackelberg games. By choosing an appropriate loss function for the attacker and optimizing with algorithms that exploit second-order information, we design poisoning attacks that are effective on neural networks. We present efficient implementations that exploit modern auto-differentiation packages and allow simultaneous and coordinated generation of tens of thousands of poisoned points, in contrast to existing methods that generate poisoned points one by one. We further perform extensive experiments that empirically explore the effect of data poisoning attacks on deep neural networks., Comment: Accepted to TMLR in 2022
Published: 2022

16. Proportional Fairness in Federated Learning

Author: Zhang, Guojun, Malekmohammadi, Saber, Chen, Xi, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: With the increasingly broad deployment of federated learning (FL) systems in the real world, it is critical but challenging to ensure fairness in FL, i.e. reasonably satisfactory performances for each of the numerous diverse clients. In this work, we introduce and study a new fairness notion in FL, called proportional fairness (PF), which is based on the relative change of each client's performance. From its connection with the bargaining games, we propose PropFair, a novel and easy-to-implement algorithm for finding proportionally fair solutions in FL and study its convergence properties. Through extensive experiments on vision and language datasets, we demonstrate that PropFair can approximately find PF solutions, and it achieves a good balance between the average performances of all clients and of the worst 10% clients. Our code is available at \url{https://github.com/huawei-noah/Federated-Learning/tree/main/FairFL}., Comment: Accepted at TMLR 2023, code: https://github.com/huawei-noah/Federated-Learning/tree/main/FairFL
Published: 2022

17. Demystifying and Generalizing BinaryConnect

Author: Dockhorn, Tim, Yu, Yaoliang, Sari, Eyyüb, Zolnouri, Mahdi, and Nia, Vahid Partovi
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: BinaryConnect (BC) and its many variations have become the de facto standard for neural network quantization. However, our understanding of the inner workings of BC is still quite limited. We attempt to close this gap in four different aspects: (a) we show that existing quantization algorithms, including post-training quantization, are surprisingly similar to each other; (b) we argue for proximal maps as a natural family of quantizers that is both easy to design and analyze; (c) we refine the observation that BC is a special case of dual averaging, which itself is a special case of the generalized conditional gradient algorithm; (d) consequently, we propose ProxConnect (PC) as a generalization of BC and we prove its convergence properties by exploiting the established connections. We conduct experiments on CIFAR-10 and ImageNet, and verify that PC achieves competitive performance., Comment: NeurIPS 2021
Published: 2021

18. An Operator Splitting View of Federated Learning

Author: Malekmohammadi, Saber, Shaloudegi, Kiarash, Hu, Zeou, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning
Abstract: Over the past few years, the federated learning ($\texttt{FL}$) community has witnessed a proliferation of new $\texttt{FL}$ algorithms. However, our understating of the theory of $\texttt{FL}$ is still fragmented, and a thorough, formal comparison of these algorithms remains elusive. Motivated by this gap, we show that many of the existing $\texttt{FL}$ algorithms can be understood from an operator splitting point of view. This unification allows us to compare different algorithms with ease, to refine previous convergence results and to uncover new algorithmic variants. In particular, our analysis reveals the vital role played by the step size in $\texttt{FL}$ algorithms. The unification also leads to a streamlined and economic way to accelerate $\texttt{FL}$ algorithms, without incurring any communication overhead. We perform numerical experiments on both convex and nonconvex models to validate our findings., Comment: 30 pages, 28 figures
Published: 2021

19. $S^3$: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks

Author: Li, Xinlin, Liu, Bang, Yu, Yaoliang, Liu, Wulong, Xu, Chunjing, and Nia, Vahid Partovi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Shift neural networks reduce computation complexity by removing expensive multiplication operations and quantizing continuous weights into low-bit discrete values, which are fast and energy efficient compared to conventional neural networks. However, existing shift networks are sensitive to the weight initialization, and also yield a degraded performance caused by vanishing gradient and weight sign freezing problem. To address these issues, we propose S low-bit re-parameterization, a novel technique for training low-bit shift networks. Our method decomposes a discrete parameter in a sign-sparse-shift 3-fold manner. In this way, it efficiently learns a low-bit network with a weight dynamics similar to full-precision networks and insensitive to weight initialization. Our proposed training method pushes the boundaries of shift neural networks and shows 3-bit shift networks out-performs their full-precision counterparts in terms of top-1 accuracy on ImageNet.
Published: 2021

20. Quantifying and Improving Transferability in Domain Generalization

Author: Zhang, Guojun, Zhao, Han, Yu, Yaoliang, and Poupart, Pascal
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Out-of-distribution generalization is one of the key challenges when transferring a model from the lab to the real world. Existing efforts mostly focus on building invariant features among source and target domains. Based on invariant features, a high-performing classifier on source domains could hopefully behave equally well on a target domain. In other words, the invariant features are \emph{transferable}. However, in practice, there are no perfectly transferable features, and some algorithms seem to learn "more transferable" features than others. How can we understand and quantify such \emph{transferability}? In this paper, we formally define transferability that one can quantify and compute in domain generalization. We point out the difference and connection with common discrepancy measures between domains, such as total variation and Wasserstein distance. We then prove that our transferability can be estimated with enough samples and give a new upper bound for the target error based on our transferability. Empirically, we evaluate the transferability of the feature embeddings learned by existing algorithms for domain generalization. Surprisingly, we find that many algorithms are not quite learning transferable features, although few could still survive. In light of this, we propose a new algorithm for learning transferable features and test it over various benchmark datasets, including RotatedMNIST, PACS, Office-Home and WILDS-FMoW. Experimental results show that the proposed algorithm achieves consistent improvement over many state-of-the-art algorithms, corroborating our theoretical findings., Comment: NeurIPS 2021
Published: 2021

21. A Unifying Framework for Federated Learning

Author: Malekmohammadi, Saber, Shaloudegi, Kiarash, Hu, Zeou, Yu, Yaoliang, Ong, Yew Soon, Series Editor, Gupta, Abhishek, Series Editor, Gong, Maoguo, Series Editor, Razavi-Far, Roozbeh, editor, Wang, Boyu, editor, Taylor, Matthew E., editor, and Yang, Qiang, editor
Published: 2023
Full Text: View/download PDF

22. Posterior Differential Regularization with f-divergence for Improving Model Robustness

Author: Cheng, Hao, Liu, Xiaodong, Pereira, Lis, Yu, Yaoliang, and Gao, Jianfeng
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We address the problem of enhancing model robustness through regularization. Specifically, we focus on methods that regularize the model posterior difference between clean and noisy inputs. Theoretically, we provide a connection of two recent methods, Jacobian Regularization and Virtual Adversarial Training, under this framework. Additionally, we generalize the posterior differential regularization to the family of $f$-divergences and characterize the overall regularization framework in terms of Jacobian matrix. Empirically, we systematically compare those regularizations and standard BERT training on a diverse set of tasks to provide a comprehensive profile of their effect on model in-domain and out-of-domain generalization. For both fully supervised and semi-supervised settings, our experiments show that regularizing the posterior differential with $f$-divergence can result in well-improved model robustness. In particular, with a proper $f$-divergence, a BERT-base model can achieve comparable generalization as its BERT-large counterpart for in-domain, adversarial and domain shift scenarios, indicating the great potential of the proposed framework for boosting model generalization for NLP models., Comment: NAACL 2021
Published: 2020

23. OLALA: Object-Level Active Learning for Efficient Document Layout Annotation

Author: Shen, Zejiang, Zhao, Jian, Dell, Melissa, Yu, Yaoliang, and Li, Weining
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Document images often have intricate layout structures, with numerous content regions (e.g. texts, figures, tables) densely arranged on each page. This makes the manual annotation of layout datasets expensive and inefficient. These characteristics also challenge existing active learning methods, as image-level scoring and selection suffer from the overexposure of common objects.Inspired by recent progresses in semi-supervised learning and self-training, we propose an Object-Level Active Learning framework for efficient document layout Annotation, OLALA. In this framework, only regions with the most ambiguous object predictions within an image are selected for annotators to label, optimizing the use of the annotation budget. For unselected predictions, the semi-automatic correction algorithm is proposed to identify certain errors based on prior knowledge of layout structures and rectifies them with minor supervision. Additionally, we carefully design a perturbation-based object scoring function for document images. It governs the object selection process via evaluating prediction ambiguities, and considers both the positions and categories of predicted layout objects. Extensive experiments show that OLALA can significantly boost model performance and improve annotation efficiency, given the same labeling budget. Code for this paper can be accessed via https://github.com/lolipopshock/detectron2_al., Comment: 12 pages, 7 figures, 5 tables
Published: 2020

24. Stronger and Faster Wasserstein Adversarial Attacks

Author: Wu, Kaiwen, Wang, Allen Houze, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Deep models, while being extremely flexible and accurate, are surprisingly vulnerable to "small, imperceptible" perturbations known as adversarial attacks. While the majority of existing attacks focus on measuring perturbations under the $\ell_p$ metric, Wasserstein distance, which takes geometry in pixel space into account, has long been known to be a suitable metric for measuring image quality and has recently risen as a compelling alternative to the $\ell_p$ metric in adversarial attacks. However, constructing an effective attack under the Wasserstein metric is computationally much more challenging and calls for better optimization algorithms. We address this gap in two ways: (a) we develop an exact yet efficient projection operator to enable a stronger projected gradient attack; (b) we show that the Frank-Wolfe method equipped with a suitable linear minimization oracle works extremely fast under Wasserstein constraints. Our algorithms not only converge faster but also generate much stronger attacks. For instance, we decrease the accuracy of a residual network on CIFAR-10 to $3.4\%$ within a Wasserstein perturbation ball of radius $0.005$, in contrast to $65.6\%$ using the previous Wasserstein attack based on an \emph{approximate} projection operator. Furthermore, employing our stronger attacks in adversarial training significantly improves the robustness of adversarially trained models., Comment: 30 pages, accepted to ICML 2020
Published: 2020

25. Newton-type Methods for Minimax Optimization

Author: Zhang, Guojun, Wu, Kaiwen, Poupart, Pascal, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Differential games, in particular two-player sequential zero-sum games (a.k.a. minimax optimization), have been an important modeling tool in applied science and received renewed interest in machine learning due to many recent applications, such as adversarial training, generative models and reinforcement learning. However, existing theory mostly focuses on convex-concave functions with few exceptions. In this work, we propose two novel Newton-type algorithms for nonconvex-nonconcave minimax optimization. We prove their local convergence at strict local minimax points, which are surrogates of global solutions. We argue that our Newton-type algorithms nicely complement existing ones in that (a) they converge faster to strict local minimax points; (b) they are much more effective when the problem is ill-conditioned; (c) their computational complexity remains similar. We verify the effectiveness of our Newton-type algorithms through experiments on training GANs which are intrinsically nonconvex and ill-conditioned. Our code is available at https://github.com/watml/min-max-2nd-order., Comment: code update
Published: 2020

26. Federated Learning Meets Multi-objective Optimization

Author: Hu, Zeou, Shaloudegi, Kiarash, Zhang, Guojun, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Federated learning has emerged as a promising, massively distributed way to train a joint deep model over large amounts of edge devices while keeping private user data strictly on device. In this work, motivated from ensuring fairness among users and robustness against malicious adversaries, we formulate federated learning as multi-objective optimization and propose a new algorithm FedMGDA+ that is guaranteed to converge to Pareto stationary solutions. FedMGDA+ is simple to implement, has fewer hyperparameters to tune, and refrains from sacrificing the performance of any participating user. We establish the convergence properties of FedMGDA+ and point out its connections to existing approaches. Extensive experiments on a variety of datasets confirm that FedMGDA+ compares favorably against state-of-the-art., Comment: Accepted at IEEE Transactions on Network Science and Engineering 2022
Published: 2020

27. Density Deconvolution with Normalizing Flows

Author: Dockhorn, Tim, Ritchie, James A., Yu, Yaoliang, and Murray, Iain
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Density deconvolution is the task of estimating a probability density function given only noise-corrupted samples. We can fit a Gaussian mixture model to the underlying density by maximum likelihood if the noise is normally distributed, but would like to exploit the superior density estimation performance of normalizing flows and allow for arbitrary noise distributions. Since both adjustments lead to an intractable likelihood, we resort to amortized variational inference. We demonstrate some problems involved in this approach, however, experiments on real data demonstrate that flows can already out-perform Gaussian mixtures for density deconvolution., Comment: Appearing at the second workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (ICML 2020), Virtual Conference. 8 pages, 6 figures, 5 tables
Published: 2020

28. Network Comparison with Interpretable Contrastive Network Representation Learning

Author: Fujiwara, Takanori, Zhao, Jian, Chen, Francine, Yu, Yaoliang, and Ma, Kwan-Liu
Subjects: Computer Science - Machine Learning, Computer Science - Social and Information Networks, Statistics - Machine Learning
Abstract: Identifying unique characteristics in a network through comparison with another network is an essential network analysis task. For example, with networks of protein interactions obtained from normal and cancer tissues, we can discover unique types of interactions in cancer tissues. This analysis task could be greatly assisted by contrastive learning, which is an emerging analysis approach to discover salient patterns in one dataset relative to another. However, existing contrastive learning methods cannot be directly applied to networks as they are designed only for high-dimensional data analysis. To address this problem, we introduce a new analysis approach called contrastive network representation learning (cNRL). By integrating two machine learning schemes, network representation learning and contrastive learning, cNRL enables embedding of network nodes into a low-dimensional representation that reveals the uniqueness of one network compared to another. Within this approach, we also design a method, named i-cNRL, which offers interpretability in the learned results, allowing for understanding which specific patterns are only found in one network. We demonstrate the effectiveness of i-cNRL for network comparison with multiple network models and real-world datasets. Furthermore, we compare i-cNRL and other potential cNRL algorithm designs through quantitative and qualitative evaluations., Comment: To appear in Journal of Data Science, Statistics, and Visualisation. The previous preprint version was titled "Interpretable Contrastive Learning for Networks" (arXiv:2005.12419v1)
Published: 2020

29. Showing Your Work Doesn't Always Work

Author: Tang, Raphael, Lee, Jaejun, Xin, Ji, Liu, Xinyu, Yu, Yaoliang, and Lin, Jimmy
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In natural language processing, a recently popular line of work explores how to best report the experimental results of neural networks. One exemplar publication, titled "Show Your Work: Improved Reporting of Experimental Results," advocates for reporting the expected validation effectiveness of the best-tuned model, with respect to the computational budget. In the present work, we critically examine this paper. As far as statistical generalizability is concerned, we find unspoken pitfalls and caveats with this approach. We analytically show that their estimator is biased and uses error-prone assumptions. We find that the estimator favors negative errors and yields poor bootstrapped confidence intervals. We derive an unbiased alternative and bolster our claims with empirical evidence from statistical simulation. Our codebase is at http://github.com/castorini/meanmax., Comment: Accepted to ACL 2020
Published: 2020

30. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Author: Xin, Ji, Tang, Raphael, Lee, Jaejun, Yu, Yaoliang, and Lin, Jimmy
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. However, they are also notorious for being slow in inference, which makes them difficult to deploy in real-time applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Our approach allows samples to exit earlier without passing through the entire model. Experiments show that DeeBERT is able to save up to ~40% inference time with minimal degradation in model quality. Further analyses show different behaviors in the BERT transformer layers and also reveal their redundancy. Our work provides new ideas to efficiently apply deep transformer-based models to downstream tasks. Code is available at https://github.com/castorini/DeeBERT., Comment: Accepted at ACL 2020
Published: 2020

31. Convex Representation Learning for Generalized Invariance in Semi-Inner-Product Space

Author: Ma, Yingyi, Ganapathiraman, Vignesh, Yu, Yaoliang, and Zhang, Xinhua
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Invariance (defined in a general sense) has been one of the most effective priors for representation learning. Direct factorization of parametric models is feasible only for a small range of invariances, while regularization approaches, despite improved generality, lead to nonconvex optimization. In this work, we develop a convex representation learning algorithm for a variety of generalized invariances that can be modeled as semi-norms. Novel Euclidean embeddings are introduced for kernel representers in a semi-inner-product space, and approximation bounds are established. This allows invariant representations to be learned efficiently and effectively as confirmed in our experiments, along with accurate predictions., Comment: to appear in ICML 2020
Published: 2020

32. A Positivstellensatz for Conditional SAGE Signomials

Author: Wang, Allen Houze, Jaini, Priyank, Yu, Yaoliang, and Poupart, Pascal
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning, Mathematics - Algebraic Geometry
Abstract: Recently, the conditional SAGE certificate has been proposed as a sufficient condition for signomial positivity over a convex set. In this article, we show that the conditional SAGE certificate is $\textit{complete}$. That is, for any signomial $f(\mathbf{x}) = \sum_{j=1}^{\ell}c_j \exp(\mathbf{A}_j\mathbf{x})$ defined by rational exponents that is positive over a compact convex set $\mathcal{X}$, there is $p \in \mathbb{Z}_+$ and a specific positive definite function $w(\mathbf{x})$ such that $w(\mathbf{x})^p f(\mathbf{x})$ may be verified by the conditional SAGE certificate. The completeness result is analogous to Positivstellensatz results from algebraic geometry, which guarantees representation of positive polynomials with sum of squares polynomials. The result gives rise to a convergent hierarchy of lower bounds for constrained signomial optimization over an $\textit{arbitrary}$ compact convex set that is computable via the conditional SAGE certificate., Comment: 19 pages, preprint
Published: 2020

33. Optimality and Stability in Non-Convex Smooth Games

Author: Zhang, Guojun, Poupart, Pascal, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Convergence to a saddle point for convex-concave functions has been studied for decades, while recent years has seen a surge of interest in non-convex (zero-sum) smooth games, motivated by their recent wide applications. It remains an intriguing research challenge how local optimal points are defined and which algorithm can converge to such points. An interesting concept is known as the local minimax point, which strongly correlates with the widely-known gradient descent ascent algorithm. This paper aims to provide a comprehensive analysis of local minimax points, such as their relation with other solution concepts and their optimality conditions. We find that local saddle points can be regarded as a special type of local minimax points, called uniformly local minimax points, under mild continuity assumptions. In (non-convex) quadratic games, we show that local minimax points are (in some sense) equivalent to global minimax points. Finally, we study the stability of gradient algorithms near local minimax points. Although gradient algorithms can converge to local/global minimax points in the non-degenerate case, they would often fail in general cases. This implies the necessity of either novel algorithms or concepts beyond saddle points and minimax points in non-convex smooth games., Comment: accepted by JMLR 2022
Published: 2020

34. Unsupervised Multilingual Alignment using Wasserstein Barycenter

Author: Lian, Xin, Jain, Kshitij, Truszkowski, Jakub, Poupart, Pascal, and Yu, Yaoliang
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning, I.2.7
Abstract: We study unsupervised multilingual alignment, the problem of finding word-to-word translations between multiple languages without using any parallel data. One popular strategy is to reduce multilingual alignment to the much simplified bilingual setting, by picking one of the input languages as the pivot language that we transit through. However, it is well-known that transiting through a poorly chosen pivot language (such as English) may severely degrade the translation quality, since the assumed transitive relations among all pairs of languages may not be enforced in the training process. Instead of going through a rather arbitrarily chosen pivot language, we propose to use the Wasserstein barycenter as a more informative "mean" language: it encapsulates information from all languages and minimizes all pairwise transportation costs. We evaluate our method on standard benchmarks and demonstrate state-of-the-art performances., Comment: Code is available at https://github.com/alixxxin/multi-lang
Published: 2020
Full Text: View/download PDF

35. Exploiting Token and Path-based Representations of Code for Identifying Security-Relevant Commits

Author: Ram, Achyudh, Xin, Ji, Nagappan, Meiyappan, Yu, Yaoliang, Lozoya, Rocío Cabrera, Sabetta, Antonino, and Lin, Jimmy
Subjects: Computer Science - Software Engineering, Computer Science - Computation and Language
Abstract: Public vulnerability databases such as CVE and NVD account for only 60% of security vulnerabilities present in open-source projects, and are known to suffer from inconsistent quality. Over the last two years, there has been considerable growth in the number of known vulnerabilities across projects available in various repositories such as NPM and Maven Central. Such an increasing risk calls for a mechanism to infer the presence of security threats in a timely manner. We propose novel hierarchical deep learning models for the identification of security-relevant commits from either the commit diff or the source code for the Java classes. By comparing the performance of our model against code2vec, a state-of-the-art model that learns from path-based representations of code, and a logistic regression baseline, we show that deep learning models show promising results in identifying security-related commits. We also conduct a comparative analysis of how various deep learning models learn across different input representations and the effect of regularization on the generalization of our models.
Published: 2019

36. Convergence of Gradient Methods on Bilinear Zero-Sum Games

Author: Zhang, Guojun and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Min-max formulations have attracted great attention in the ML community due to the rise of deep generative models and adversarial methods, while understanding the dynamics of gradient algorithms for solving such formulations has remained a grand challenge. As a first step, we restrict to bilinear zero-sum games and give a systematic analysis of popular gradient updates, for both simultaneous and alternating versions. We provide exact conditions for their convergence and find the optimal parameter setup and convergence rates. In particular, our results offer formal evidence that alternating updates converge "better" than simultaneous ones.
Published: 2019

37. Understanding Adversarial Robustness: The Trade-off between Minimum and Average Margin

Author: Wu, Kaiwen and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Deep models, while being extremely versatile and accurate, are vulnerable to adversarial attacks: slight perturbations that are imperceptible to humans can completely flip the prediction of deep models. Many attack and defense mechanisms have been proposed, although a satisfying solution still largely remains elusive. In this work, we give strong evidence that during training, deep models maximize the minimum margin in order to achieve high accuracy, but at the same time decrease the \emph{average} margin hence hurting robustness. Our empirical results highlight an intrinsic trade-off between accuracy and robustness for current deep model training. To further address this issue, we propose a new regularizer to explicitly promote average margin, and we verify through extensive experiments that it does lead to better robustness. Our regularized objective remains Fisher-consistent, hence asymptotically can still recover the Bayes optimal classifier.
Published: 2019

38. Tails of Lipschitz Triangular Flows

Author: Jaini, Priyank, Kobyzev, Ivan, Yu, Yaoliang, and Brubaker, Marcus
Subjects: Mathematics - Statistics Theory, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We investigate the ability of popular flow based methods to capture tail-properties of a target density by studying the increasing triangular maps used in these flow methods acting on a tractable source density. We show that the density quantile functions of the source and target density provide a precise characterization of the slope of transformation required to capture tails in a target density. We further show that any Lipschitz-continuous transport map acting on a source density will result in a density with similar tail properties as the source, highlighting the trade-off between a complex source density and a sufficiently expressive transformation to capture desirable properties of a target density. Subsequently, we illustrate that flow models like Real-NVP, MAF, and Glow as implemented originally lack the ability to capture a distribution with non-Gaussian tails. We circumvent this problem by proposing tail-adaptive flows consisting of a source distribution that can be learned simultaneously with the triangular map to capture tail-properties of a target density. We perform several synthetic and real-world experiments to compliment our theoretical findings., Comment: Published at the 37th International Conference of Machine Learning, (ICML 2020)
Published: 2019

39. Distributional Reinforcement Learning for Efficient Exploration

Author: Mavrin, Borislav, Zhang, Shangtong, Yao, Hengshuai, Kong, Linglong, Wu, Kaiwen, and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain across 49 games in cumulative rewards over QR-DQN with a big win in Venture). We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves near-optimal safety rewards twice faster than QRDQN.
Published: 2019

40. Sum-of-Squares Polynomial Flow

Author: Jaini, Priyank, Selby, Kira A., and Yu, Yaoliang
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Triangular map is a recent construct in probability theory that allows one to transform any source probability density function to any target density function. Based on triangular maps, we propose a general framework for high-dimensional density estimation, by specifying one-dimensional transformations (equivalently conditional densities) and appropriate conditioner networks. This framework (a) reveals the commonalities and differences of existing autoregressive and flow based methods, (b) allows a unified understanding of the limitations and representation power of these recent approaches and, (c) motivates us to uncover a new Sum-of-Squares (SOS) flow that is interpretable, universal, and easy to train. We perform several synthetic experiments on various density geometries to demonstrate the benefits (and short-comings) of such transformations. SOS flows achieve competitive results in simulations and several real-world datasets., Comment: 13 pages, ICML'2019
Published: 2019

41. Robust Multiple Kernel k-means Clustering using Min-Max Optimization

Author: Bang, Seojin, Yu, Yaoliang, and Wu, Wei
Subjects: Computer Science - Machine Learning
Abstract: Multiple kernel learning is a type of multiview learning that combines different data modalities by capturing view-specific patterns using kernels. Although supervised multiple kernel learning has been extensively studied, until recently, only a few unsupervised approaches have been proposed. In the meanwhile, adversarial learning has recently received much attention. Many works have been proposed to defend against adversarial examples. However, little is known about the effect of adversarial perturbation in the context of multiview learning, and even less in the unsupervised case. In this study, we show that adversarial features added to a view can make the existing approaches with the min-max formulation in multiple kernel clustering yield unfavorable clusters. To address this problem and inspired by recent works in adversarial learning, we propose a multiple kernel clustering method with the min-max framework that aims to be robust to such adversarial perturbation. We evaluate the robustness of our method on simulation data under different types of adversarial perturbations and show that it outperforms several compared existing methods. In the real data analysis, We demonstrate the utility of our method on a real-world problem., Comment: R package is available at https://github.com/SeojinBang/MKKC
Published: 2018

42. Provably noise-robust, regularised $k$-means clustering

Author: Kushagra, Shrinu, Yu, Yaoliang, and Ben-David, Shai
Subjects: Computer Science - Machine Learning
Abstract: We consider the problem of clustering in the presence of noise. That is, when on top of cluster structure, the data also contains a subset of \emph{unstructured} points. Our goal is to detect the clusters despite the presence of many unstructured points. Any algorithm that achieves this goal is noise-robust. We consider a regularisation method which converts any center-based clustering objective into a noise-robust one. We focus on the $k$-means objective and we prove that the regularised version of $k$-means is NP-Hard even for $k=1$. We consider two algorithms based on the convex (sdp and lp) relaxation of the regularised objective and prove robustness guarantees for both. The sdp and lp relaxation of the standard (non-regularised) $k$-means objective has been previously studied by [ABC+15]. Under the stochastic ball model of the data they show that the sdp-based algorithm recovers the underlying structure as long as the balls are separated by $\delta > 2\sqrt{2} + \epsilon$. We improve upon this result in two ways. First, we show recovery even for $\delta > 2 + \epsilon$. Second, our regularised algorithm recovers the balls even in the presence of noise so long as the number of noisy points is not too large. We complement our theoretical analysis with simulations and analyse the effect of various parameters like regularization constant, noise-level etc. on the performance of our algorithm. In the presence of noise, our algorithm performs better than $k$-means++ on MNIST.
Published: 2017

43. A Unifying Framework for Federated Learning

Author: Malekmohammadi, Saber, primary, Shaloudegi, Kiarash, additional, Hu, Zeou, additional, and Yu, Yaoliang, additional
Published: 2022
Full Text: View/download PDF

44. Convex-constrained Sparse Additive Modeling and Its Extensions

Author: Yin, Junming and Yu, Yaoliang
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: Sparse additive modeling is a class of effective methods for performing high-dimensional nonparametric regression. In this work we show how shape constraints such as convexity/concavity and their extensions, can be integrated into additive models. The proposed sparse difference of convex additive models (SDCAM) can estimate most continuous functions without any a priori smoothness assumption. Motivated by a characterization of difference of convex functions, our method incorporates a natural regularization functional to avoid overfitting and to reduce model complexity. Computationally, we develop an efficient backfitting algorithm with linear per-iteration complexity. Experiments on both synthetic and real data verify that our method is competitive against state-of-the-art sparse additive models, with improved performance in most scenarios., Comment: 17 pages, 2 figures
Published: 2017

45. Distributed Proximal Gradient Algorithm for Partially Asynchronous Computer Clusters

Author: Zhou, Yi, Yu, Yaoliang, Dai, Wei, Liang, Yingbin, and Xing, Eric P.
Subjects: Mathematics - Optimization and Control
Abstract: With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile distributed algorithm has become vital for the success of many large-scale machine learning applications. In this work we propose m-PAPG, an implementation of the flexible proximal gradient algorithm in model parallel systems equipped with the partially asynchronous communication protocol. The worker machines communicate asynchronously with a controlled staleness bound $s$ and operate at different frequencies. We characterize various convergence properties of m-PAPG: 1) Under a general non-smooth and non-convex setting, we prove that every limit point of the sequence generated by m-PAPG is a critical point of the objective function; 2) Under an error bound condition, we prove that the function value decays linearly for every $s$ steps; 3) Under the Kurdyka-${\L}$ojasiewicz inequality, we prove that the sequences generated by m-PAPG converge to the same critical point, provided that a proximal Lipschitz condition is satisfied.
Published: 2017

46. Splitting Algorithms for Federated Learning

Author: Malekmohammadi, Saber, Shaloudegi, Kiarash, Hu, Zeou, Yu, Yaoliang, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Kamp, Michael, editor, Koprinska, Irena, editor, Bibal, Adrien, editor, Bouadi, Tassadit, editor, Frénay, Benoît, editor, Galárraga, Luis, editor, Oramas, José, editor, Adilova, Linara, editor, Krishnamurthy, Yamuna, editor, Kang, Bo, editor, Largeron, Christine, editor, Lijffijt, Jefrey, editor, Viard, Tiphaine, editor, Welke, Pascal, editor, Ruocco, Massimiliano, editor, Aune, Erlend, editor, Gallicchio, Claudio, editor, Schiele, Gregor, editor, Pernkopf, Franz, editor, Blott, Michaela, editor, Fröning, Holger, editor, Schindler, Günther, editor, Guidotti, Riccardo, editor, Monreale, Anna, editor, Rinzivillo, Salvatore, editor, Biecek, Przemyslaw, editor, Ntoutsi, Eirini, editor, Pechenizkiy, Mykola, editor, Rosenhahn, Bodo, editor, Buckley, Christopher, editor, Cialfi, Daniela, editor, Lanillos, Pablo, editor, Ramstead, Maxwell, editor, Verbelen, Tim, editor, Ferreira, Pedro M., editor, Andresini, Giuseppina, editor, Malerba, Donato, editor, Medeiros, Ibéria, editor, Fournier-Viger, Philippe, editor, Nawaz, M. Saqib, editor, Ventura, Sebastian, editor, Sun, Meng, editor, Zhou, Min, editor, Bitetta, Valerio, editor, Bordino, Ilaria, editor, Ferretti, Andrea, editor, Gullo, Francesco, editor, Ponti, Giovanni, editor, Severini, Lorenzo, editor, Ribeiro, Rita, editor, Gama, João, editor, Gavaldà, Ricard, editor, Cooper, Lee, editor, Ghazaleh, Naghmeh, editor, Richiardi, Jonas, editor, Roqueiro, Damian, editor, Saldana Miranda, Diego, editor, Sechidis, Konstantinos, editor, and Graça, Guilherme, editor
Published: 2021
Full Text: View/download PDF

47. Dropout with Expectation-linear Regularization

Author: Ma, Xuezhe, Gao, Yingkai, Hu, Zhiting, Yu, Yaoliang, Deng, Yuntian, and Hovy, Eduard
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: Dropout, a simple and effective way to train deep neural networks, has led to a number of impressive empirical successes and spawned many recent theoretical investigations. However, the gap between dropout's training and inference phases, introduced due to tractability considerations, has largely remained under-appreciated. In this work, we first formulate dropout as a tractable approximation of some latent variable model, leading to a clean view of parameter sharing and enabling further theoretical analysis. Then, we introduce (approximate) expectation-linear dropout neural networks, whose inference gap we are able to formally characterize. Algorithmically, we show that our proposed measure of the inference gap can be used to regularize the standard dropout training objective, resulting in an \emph{explicit} control of the gap. Our method is as simple and efficient as standard dropout. We further prove the upper bounds on the loss in accuracy due to expectation-linearization, describe classes of input distributions that expectation-linearize easily. Experiments on three image classification benchmark datasets demonstrate that reducing the inference gap can indeed improve the performance consistently., Comment: Published as a conference paper at ICLR 2017. Camera-ready Version. 23 pages (paper + appendix)
Published: 2016

48. Additive Approximations in High Dimensional Nonparametric Regression via the SALSA

Author: Kandasamy, Kirthevasan and Yu, Yaoliang
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: High dimensional nonparametric regression is an inherently difficult problem with known lower bounds depending exponentially in dimension. A popular strategy to alleviate this curse of dimensionality has been to use additive models of \emph{first order}, which model the regression function as a sum of independent functions on each dimension. Though useful in controlling the variance of the estimate, such models are often too restrictive in practical settings. Between non-additive models which often have large variance and first order additive models which have large bias, there has been little work to exploit the trade-off in the middle via additive models of intermediate order. In this work, we propose SALSA, which bridges this gap by allowing interactions between variables, but controls model capacity by limiting the order of interactions. SALSA minimises the residual sum of squares with squared RKHS norm penalties. Algorithmically, it can be viewed as Kernel Ridge Regression with an additive kernel. When the regression function is additive, the excess risk is only polynomial in dimension. Using the Girard-Newton formulae, we efficiently sum over a combinatorial number of terms in the additive expansion. Via a comparison on $15$ real datasets, we show that our method is competitive against $21$ other alternatives., Comment: International Conference on Machine Learning (ICML) 2016
Published: 2016

49. Distributed Machine Learning via Sufficient Factor Broadcasting

Author: Xie, Pengtao, Kim, Jin Kyu, Zhou, Yi, Ho, Qirong, Kumar, Abhimanu, Yu, Yaoliang, and Xing, Eric
Subjects: Computer Science - Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML problems starting at millions of samples and tens of thousands of classes, their parameter matrix can grow at an unexpected rate, resulting in high parameter synchronization costs that greatly slow down distributed learning. To address this issue, we propose a Sufficient Factor Broadcasting (SFB) computation model for efficient distributed learning of a large family of matrix-parameterized models, which share the following property: the parameter update computed on each data sample is a rank-1 matrix, i.e., the outer product of two "sufficient factors" (SFs). By broadcasting the SFs among worker machines and reconstructing the update matrices locally at each worker, SFB improves communication efficiency --- communication costs are linear in the parameter matrix's dimensions, rather than quadratic --- without affecting computational correctness. We present a theoretical convergence analysis of SFB, and empirically corroborate its efficiency on four different matrix-parametrized ML models.
Published: 2015

50. Efficient Structured Matrix Rank Minimization

Author: Yu, Adams Wei, Ma, Wanli, Yu, Yaoliang, Carbonell, Jaime G., and Sra, Suvrit
Subjects: Computer Science - Systems and Control, Mathematics - Optimization and Control
Abstract: We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map. In contrast to most known approaches for linearly structured rank minimization, we do not (a) use the full SVD, nor (b) resort to augmented Lagrangian techniques, nor (c) solve linear systems per iteration. Instead, we formulate the problem differently so that it is amenable to a generalized conditional gradient method, which results in a practical improvement with low per iteration computational cost. Numerical results show that our approach significantly outperforms state-of-the-art competitors in terms of running time, while effectively recovering low rank solutions in stochastic system realization and spectral compressed sensing problems.
Published: 2015

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

197 results on '"Yu, Yaoliang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources