Author: "Takáč P" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Takáč P"' showing total 423 results

Start Over Author "Takáč P"

423 results on '"Takáč P"'

1. FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

Author: Zmushko, Philip, Beznosikov, Aleksandr, Takáč, Martin, and Horváth, Samuel
Subjects: Computer Science - Machine Learning
Abstract: With the increase in the number of parameters in large language models, the process of pre-training and fine-tuning increasingly demands larger volumes of GPU memory. A significant portion of this memory is typically consumed by the optimizer state. To overcome this challenge, recent approaches such as low-rank adaptation (LoRA (Hu et al., 2021)), low-rank gradient projection (GaLore (Zhao et al., 2024)), and blockwise optimization (BAdam (Luo et al., 2024)) have been proposed. However, in all these algorithms, the $\textit{effective rank of the weight updates remains low-rank}$, which can lead to a substantial loss of information from the gradient. This loss can be critically important, especially during the pre-training stage. In this paper, we introduce $\texttt{FRUGAL}$ ($\textbf{F}$ull-$\textbf{R}$ank $\textbf{U}$pdates with $\textbf{G}$r$\textbf{A}$dient sp$\textbf{L}$itting), a new memory-efficient optimization framework. $\texttt{FRUGAL}$ leverages gradient splitting to perform low-dimensional updates using advanced algorithms (such as Adam), while updates along the remaining directions are executed via state-free methods like SGD or signSGD (Bernstein et al., 2018). Our framework can be integrated with various low-rank update selection techniques, including GaLore and BAdam. We provide theoretical convergence guarantees for our framework when using SGDM for low-dimensional updates and SGD for state-free updates. Additionally, our method consistently outperforms concurrent approaches across various fixed memory budgets, achieving state-of-the-art results in pre-training and fine-tuning tasks while balancing memory efficiency and performance metrics.
Published: 2024

2. $\psi$DAG: Projected Stochastic Approximation Iteration for DAG Structure Learning

Author: Ziu, Klea, Hanzely, Slavomír, Li, Loka, Zhang, Kun, Takáč, Martin, and Kamzolov, Dmitry
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: Learning the structure of Directed Acyclic Graphs (DAGs) presents a significant challenge due to the vast combinatorial search space of possible graphs, which scales exponentially with the number of nodes. Recent advancements have redefined this problem as a continuous optimization task by incorporating differentiable acyclicity constraints. These methods commonly rely on algebraic characterizations of DAGs, such as matrix exponentials, to enable the use of gradient-based optimization techniques. Despite these innovations, existing methods often face optimization difficulties due to the highly non-convex nature of DAG constraints and the per-iteration computational complexity. In this work, we present a novel framework for learning DAGs, employing a Stochastic Approximation approach integrated with Stochastic Gradient Descent (SGD)-based optimization techniques. Our framework introduces new projection methods tailored to efficiently enforce DAG constraints, ensuring that the algorithm converges to a feasible local minimum. With its low iteration complexity, the proposed method is well-suited for handling large-scale problems with improved computational efficiency. We demonstrate the effectiveness and scalability of our framework through comprehensive experimental evaluations, which confirm its superior performance across various settings.
Published: 2024

3. Enhance Hyperbolic Representation Learning via Second-order Pooling

Author: Song, Kun, Solozabal, Ruben, hao, Li, Ren, Lu, Abdar, Moloud, Li, Qing, Karray, Fakhri, and Takac, Martin
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Hyperbolic representation learning is well known for its ability to capture hierarchical information. However, the distance between samples from different levels of hierarchical classes can be required large. We reveal that the hyperbolic discriminant objective forces the backbone to capture this hierarchical information, which may inevitably increase the Lipschitz constant of the backbone. This can hinder the full utilization of the backbone's generalization ability. To address this issue, we introduce second-order pooling into hyperbolic representation learning, as it naturally increases the distance between samples without compromising the generalization ability of the input features. In this way, the Lipschitz constant of the backbone does not necessarily need to be large. However, current off-the-shelf low-dimensional bilinear pooling methods cannot be directly employed in hyperbolic representation learning because they inevitably reduce the distance expansion capability. To solve this problem, we propose a kernel approximation regularization, which enables the low-dimensional bilinear features to approximate the kernel function well in low-dimensional space. Finally, we conduct extensive experiments on graph-structured datasets to demonstrate the effectiveness of the proposed method.
Published: 2024

4. OPTAMI: Global Superlinear Convergence of High-order Methods

Author: Kamzolov, Dmitry, Pasechnyuk, Dmitry, Agafonov, Artem, Gasnikov, Alexander, and Takáč, Martin
Subjects: Mathematics - Optimization and Control
Abstract: Second-order methods for convex optimization outperform first-order methods in terms of theoretical iteration convergence, achieving rates up to $O(k^{-5})$ for highly-smooth functions. However, their practical performance and applications are limited due to their multi-level structure and implementation complexity. In this paper, we present new results on high-order optimization methods, supported by their practical performance. First, we show that the basic high-order methods, such as the Cubic Regularized Newton Method, exhibit global superlinear convergence for $\mu$-strongly star-convex functions, a class that includes $\mu$-strongly convex functions and some non-convex functions. Theoretical convergence results are both inspired and supported by the practical performance of these methods. Secondly, we propose a practical version of the Nesterov Accelerated Tensor method, called NATA. It significantly outperforms the classical variant and other high-order acceleration techniques in practice. The convergence of NATA is also supported by theoretical results. Finally, we introduce an open-source computational library for high-order methods, called OPTAMI. This library includes various methods, acceleration techniques, and subproblem solvers, all implemented as PyTorch optimizers, thereby facilitating the practical application of high-order methods to a wide range of optimization problems. We hope this library will simplify research and practical comparison of methods beyond first-order.
Published: 2024

5. Collaborative and Efficient Personalization with Mixtures of Adaptors

Author: Almansoori, Abdulla Jasem, Horváth, Samuel, and Takáč, Martin
Subjects: Computer Science - Machine Learning
Abstract: Non-iid data is prevalent in real-world federated learning problems. Data heterogeneity can come in different types in terms of distribution shifts. In this work, we are interested in the heterogeneity that comes from concept shifts, i.e., shifts in the prediction across clients. In particular, we consider multi-task learning, where we want the model to adapt to the task of the client. We propose a parameter-efficient framework to tackle this issue, where each client learns to mix between parameter-efficient adaptors according to its task. We use Low-Rank Adaptors (LoRAs) as the backbone and extend its concept to other types of layers. We call our framework Federated Low-Rank Adaptive Learning (FLoRAL). This framework is not an algorithm but rather a model parameterization for a multi-task learning objective, so it can work on top of any algorithm that optimizes this objective, which includes many algorithms from the literature. FLoRAL is memory-efficient, and clients are personalized with small states (e.g., one number per adaptor) as the adaptors themselves are federated. Hence, personalization is--in this sense--federated as well. Even though clients can personalize more freely by training an adaptor locally, we show that collaborative and efficient training of adaptors is possible and performs better. We also show that FLoRAL can outperform an ensemble of full models with optimal cluster assignment, which demonstrates the benefits of federated personalization and the robustness of FLoRAL to overfitting. We show promising experimental results on synthetic datasets, real-world federated multi-task problems such as MNIST, CIFAR-10, and CIFAR-100. We also provide a theoretical analysis of local SGD on a relaxed objective and discuss the effects of aggregation mismatch on convergence., Comment: 36 pages, 10 figures
Published: 2024

6. FedPeWS: Personalized Warmup via Subnetworks for Enhanced Heterogeneous Federated Learning

Author: Tastan, Nurbek, Horvath, Samuel, Takac, Martin, and Nandakumar, Karthik
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Statistical data heterogeneity is a significant barrier to convergence in federated learning (FL). While prior work has advanced heterogeneous FL through better optimization objectives, these methods fall short when there is extreme data heterogeneity among collaborating participants. We hypothesize that convergence under extreme data heterogeneity is primarily hindered due to the aggregation of conflicting updates from the participants in the initial collaboration rounds. To overcome this problem, we propose a warmup phase where each participant learns a personalized mask and updates only a subnetwork of the full model. This personalized warmup allows the participants to focus initially on learning specific subnetworks tailored to the heterogeneity of their data. After the warmup phase, the participants revert to standard federated optimization, where all parameters are communicated. We empirically demonstrate that the proposed personalized warmup via subnetworks (FedPeWS) approach improves accuracy and convergence speed over standard federated optimization methods.
Published: 2024

7. Almost compact embeddings between Orlicz and Lorentz spaces

Author: Musil, Vít, Pick, Luboš, and Takáč, Jakub
Subjects: Mathematics - Functional Analysis
Abstract: We characterize when an Orlicz space $L^A$ is almost compactly (uniformly absolutely continuously) embedded into a Lorentz space $L^{p,q}$ in terms of a balance condition involving parameters $p,q\in[1,\infty]$, and a Young function $A$. In the course of the proof, we develop a new method based on an inequality of Young type involving the measure of level sets of a given function.
Published: 2024

8. Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity

Author: Gorbunov, Eduard, Tupitsa, Nazarii, Choudhury, Sayantan, Aliev, Alen, Richtárik, Peter, Horváth, Samuel, and Takáč, Martin
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: Due to the non-smoothness of optimization problems in Machine Learning, generalized smoothness assumptions have been gaining a lot of attention in recent years. One of the most popular assumptions of this type is $(L_0,L_1)$-smoothness (Zhang et al., 2020). In this paper, we focus on the class of (strongly) convex $(L_0,L_1)$-smooth functions and derive new convergence guarantees for several existing methods. In particular, we derive improved convergence rates for Gradient Descent with (Smoothed) Gradient Clipping and for Gradient Descent with Polyak Stepsizes. In contrast to the existing results, our rates do not rely on the standard smoothness assumption and do not suffer from the exponential dependency from the initial distance to the solution. We also extend these results to the stochastic case under the over-parameterization assumption, propose a new accelerated method for convex $(L_0,L_1)$-smooth optimization, and derive new convergence rates for Adaptive Gradient Descent (Malitsky and Mishchenko, 2020)., Comment: 51 pages, 1 figure
Published: 2024

9. MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

Author: Fares, Samar, Ziu, Klea, Aremu, Toluwani, Durasov, Nikita, Takáč, Martin, Fua, Pascal, Nandakumar, Karthik, and Laptev, Ivan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Vision-Language Models (VLMs) are becoming increasingly vulnerable to adversarial attacks as various novel attack strategies are being proposed against these models. While existing defenses excel in unimodal contexts, they currently fall short in safeguarding VLMs against adversarial threats. To mitigate this vulnerability, we propose a novel, yet elegantly simple approach for detecting adversarial samples in VLMs. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Subsequently, we calculate the similarities of the embeddings of both input and generated images in the feature space to identify adversarial samples. Empirical evaluations conducted on different datasets validate the efficacy of our approach, outperforming baseline methods adapted from image classification domains. Furthermore, we extend our methodology to classification tasks, showcasing its adaptability and model-agnostic nature. Theoretical analyses and empirical findings also show the resilience of our approach against adaptive attacks, positioning it as an excellent defense mechanism for real-world deployment against adversarial threats.
Published: 2024

10. Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed

Author: Chezhegov, Savelii, Klyukin, Yaroslav, Semenov, Andrei, Beznosikov, Aleksandr, Gasnikov, Alexander, Horváth, Samuel, Takáč, Martin, and Gorbunov, Eduard
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training modern Deep Learning models, especially Large Language Models. Typically, the noise in the stochastic gradients is heavy-tailed for the later ones. Gradient clipping provably helps to achieve good high-probability convergence for such noises. However, despite the similarity between AdaGrad/Adam and Clip-SGD, the high-probability convergence of AdaGrad/Adam has not been studied in this case. In this work, we prove that AdaGrad (and its delayed version) can have provably bad high-probability convergence if the noise is heavy-tailed. To fix this issue, we propose a new version of AdaGrad called Clip-RAdaGradD (Clipped Reweighted AdaGrad with Delay) and prove its high-probability convergence bounds with polylogarithmic dependence on the confidence level for smooth convex/non-convex stochastic optimization with heavy-tailed noise. Our empirical evaluations, including NLP model fine-tuning, highlight the superiority of clipped versions of AdaGrad/Adam in handling the heavy-tailed noise., Comment: 37 pages, 8 figures
Published: 2024

11. Local Methods with Adaptivity via Scaling

Author: Chezhegov, Savelii, Skorik, Sergey, Khachaturov, Nikolas, Shalagin, Danil, Avetisyan, Aram, Takáč, Martin, Kholodov, Yaroslav, and Beznosikov, Aleksandr
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Mathematics - Optimization and Control
Abstract: The rapid development of machine learning and deep learning has introduced increasingly complex optimization challenges that must be addressed. Indeed, training modern, advanced models has become difficult to implement without leveraging multiple computing nodes in a distributed environment. Distributed optimization is also fundamental to emerging fields such as federated learning. Specifically, there is a need to organize the training process to minimize the time lost due to communication. A widely used and extensively researched technique to mitigate the communication bottleneck involves performing local training before communication. This approach is the focus of our paper. Concurrently, adaptive methods that incorporate scaling, notably led by Adam, have gained significant popularity in recent years. Therefore, this paper aims to merge the local training technique with the adaptive approach to develop efficient distributed learning methods. We consider the classical Local SGD method and enhance it with a scaling feature. A crucial aspect is that the scaling is described generically, allowing us to analyze various approaches, including Adam, RMSProp, and OASIS, in a unified manner. In addition to theoretical analysis, we validate the performance of our methods in practice by training a neural network., Comment: 41 pages, 2 algorithms, 6 figures, 1 table
Published: 2024

12. Current issues in aesthetics and beyond: Revisiting lookism

Author: Takáč Peter
Subjects: aesthetic authenticity, discrimination, health, lookism, social identity, unique look, Ethics, BJ1-1725
Abstract: Lookism is a term used to describe discrimination based on the physical appearance of a person. We suppose that the social impact of lookism is a philosophical issue, because, from this perspective, attractive people have an advantage over others. The first line of our argumentation involves the issue of lookism as a global ethical and aesthetical phenomenon. A person’s attractiveness has a significant impact on the social and public status of this individual. The common view in society is that it is good to be more attractive and healthier. This concept generates several ethical questions about human aesthetical identity, health, authenticity, and integrity in society. It seems that this unequal treatment causes discrimination, diminishes self-confidence, and lowers the chance of a job or social enforcement for many human beings. Currently, aesthetic improvements are being made through plastic surgery. There is no place on the human body that we cannot improve with plastic surgery or aesthetic medicine. We should not forget that it may result in the problem of elitism, in dividing people into primary and secondary categories. The second line of our argumentation involves a particular case of lookism: Melanie Gaydos. A woman that is considered to be a model with a unique look.
Published: 2020
Full Text: View/download PDF

13. Near-infrared imaging for automated tsetse pupae sex sorting in support of the sterile insect technique

Author: Argilés-Herrero Rafael, Salvador-Herranz Gustavo, Parker Andrew G., Zacarés Mario, Fall Assane G., Gaye Adji M., Nawaz Arooj, Takáč Peter, Vreysen Marc J.B., and de Beer Chantel J.
Subjects: glossina, sterile insect technique (sit), genetic management of vectors, sex separation, sexual dimorphism, nir spectrum, Infectious and parasitic diseases, RC109-216
Abstract: Tsetse flies are the cyclical vectors of African trypanosomes and one of several methods to manage this vector is the sterile insect technique (SIT). The ability to determine the sex of tsetse pupae with the objective to separate the sexes before adult emergence has been a major goal for decades for tsetse management programmes with an SIT component. Tsetse females develop faster and pharate females inside the pupae melanise 1–2 days before males. This earlier melanisation can be detected by infrared cameras through the pupal shell, and the newly developed Near InfraRed Pupae Sex Sorter (NIRPSS) takes advantage of this. The melanisation process is not homogeneous for all fly organs and the pupa needs to be examined ventrally, dorsally and laterally to ensure accurate classification by an image analysis algorithm. When the pupae are maturing at a constant temperature of 24 °C and sorted at the appropriate age, 24 days post-larviposition for Glossina palpalis gambiensis, the sorting machine can efficiently separate the sexes. The recovered male pupae can then be sterilised for field releases of males, while the rest of the pupae can be used to maintain the laboratory colony. The sorting process with the new NIRPSS had no negative impact on adult emergence and flight ability. A mean male recovery of 62.82 ± 3.61% was enough to provide sterile males to an operational SIT programme, while mean contamination with females (4.69 ± 3.02%) was low enough to have no impact on the maintenance of a laboratory colony.
Published: 2023
Full Text: View/download PDF

14. Newton Method Revisited: Global Convergence Rates up to $\mathcal {O}\left(k^{-3} \right)$ for Stepsize Schedules and Linesearch Procedures

Author: Hanzely, Slavomír, Abdukhakimov, Farshed, and Takáč, Martin
Subjects: Mathematics - Optimization and Control
Abstract: This paper investigates the global convergence of stepsized Newton methods for convex functions with H\"older continuous Hessians or third derivatives. We propose several simple stepsize schedules with fast global convergence guarantees, up to $\mathcal {O}\left(k^{-3} \right)$. For cases with multiple plausible smoothness parameterizations or an unknown smoothness constant, we introduce a stepsize linesearch and a backtracking procedure with provable convergence as if the optimal smoothness parameters were known in advance. Additionally, we present strong convergence guarantees for the practically popular Newton method with exact linesearch., Comment: 11 pages
Published: 2024

15. Self-Guiding Exploration for Combinatorial Problems

Author: Iklassov, Zangir, Du, Yali, Akimov, Farkhad, and Takac, Martin
Subjects: Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) have become pivotal in addressing reasoning tasks across diverse domains, including arithmetic, commonsense, and symbolic reasoning. They utilize prompting techniques such as Exploration-of-Thought, Decomposition, and Refinement to effectively navigate and solve intricate tasks. Despite these advancements, the application of LLMs to Combinatorial Problems (CPs), known for their NP-hardness and critical roles in logistics and resource management remains underexplored. To address this gap, we introduce a novel prompting strategy: Self-Guiding Exploration (SGE), designed to enhance the performance of solving CPs. SGE operates autonomously, generating multiple thought trajectories for each CP task. It then breaks these trajectories down into actionable subtasks, executes them sequentially, and refines the results to ensure optimal outcomes. We present our research as the first to apply LLMs to a broad range of CPs and demonstrate that SGE outperforms existing prompting strategies by over 27.84% in CP optimization performance. Additionally, SGE achieves a 2.46% higher accuracy over the best existing results in other reasoning tasks (arithmetic, commonsense, and symbolic)., Comment: 22 pages
Published: 2024

16. Exploring Jacobian Inexactness in Second-Order Methods for Variational Inequalities: Lower Bounds, Optimal Algorithms and Quasi-Newton Approximations

Author: Agafonov, Artem, Ostroukhov, Petr, Mozhaev, Roman, Yakovlev, Konstantin, Gorbunov, Eduard, Takáč, Martin, Gasnikov, Alexander, and Kamzolov, Dmitry
Subjects: Mathematics - Optimization and Control
Abstract: Variational inequalities represent a broad class of problems, including minimization and min-max problems, commonly found in machine learning. Existing second-order and high-order methods for variational inequalities require precise computation of derivatives, often resulting in prohibitively high iteration costs. In this work, we study the impact of Jacobian inaccuracy on second-order methods. For the smooth and monotone case, we establish a lower bound with explicit dependence on the level of Jacobian inaccuracy and propose an optimal algorithm for this key setting. When derivatives are exact, our method converges at the same rate as exact optimal second-order methods. To reduce the cost of solving the auxiliary problem, which arises in all high-order methods with global convergence, we introduce several Quasi-Newton approximations. Our method with Quasi-Newton updates achieves a global sublinear convergence rate. We extend our approach with a tensor generalization for inexact high-order derivatives and support the theory with experiments.
Published: 2024

17. Enhancing Policy Gradient with the Polyak Step-Size Adaption

Author: Li, Yunxiang, Yuan, Rui, Fan, Chen, Schmidt, Mark, Horváth, Samuel, Gower, Robert M., and Takáč, Martin
Subjects: Computer Science - Machine Learning
Abstract: Policy gradient is a widely utilized and foundational algorithm in the field of reinforcement learning (RL). Renowned for its convergence guarantees and stability compared to other RL algorithms, its practical application is often hindered by sensitivity to hyper-parameters, particularly the step-size. In this paper, we introduce the integration of the Polyak step-size in RL, which automatically adjusts the step-size without prior knowledge. To adapt this method to RL settings, we address several issues, including unknown f* in the Polyak step-size. Additionally, we showcase the performance of the Polyak step-size in RL through experiments, demonstrating faster convergence and the attainment of more stable policies.
Published: 2024

18. Generalized Policy Learning for Smart Grids: FL TRPO Approach

Author: Li, Yunxiang, Cuadrado, Nicolas Mauricio, Horváth, Samuel, and Takáč, Martin
Subjects: Computer Science - Machine Learning
Abstract: The smart grid domain requires bolstering the capabilities of existing energy management systems; Federated Learning (FL) aligns with this goal as it demonstrates a remarkable ability to train models on heterogeneous datasets while maintaining data privacy, making it suitable for smart grid applications, which often involve disparate data distributions and interdependencies among features that hinder the suitability of linear models. This paper introduces a framework that combines FL with a Trust Region Policy Optimization (FL TRPO) aiming to reduce energy-associated emissions and costs. Our approach reveals latent interconnections and employs personalized encoding methods to capture unique insights, understanding the relationships between features and optimal strategies, allowing our model to generalize to previously unseen data. Experimental results validate the robustness of our approach, affirming its proficiency in effectively learning policy models for smart grid challenges., Comment: ICLR 2024 Workshop: Tackling Climate Change with Machine Learning
Published: 2024

19. FRESCO: Federated Reinforcement Energy System for Cooperative Optimization

Author: Cuadrado, Nicolas Mauricio, Gutierrez, Roberto Alejandro, and Takáč, Martin
Subjects: Computer Science - Machine Learning
Abstract: The rise in renewable energy is creating new dynamics in the energy grid that promise to create a cleaner and more participative energy grid, where technology plays a crucial part in making the required flexibility to achieve the vision of the next-generation grid. This work presents FRESCO, a framework that aims to ease the implementation of energy markets using a hierarchical control architecture of reinforcement learning agents trained using federated learning. The core concept we are proving is that having greedy agents subject to changing conditions from a higher level agent creates a cooperative setup that will allow for fulfilling all the individual objectives. This paper presents a general overview of the framework, the current progress, and some insights we obtained from the recent results., Comment: Tiny Paper at ICLR 2023
Published: 2024

20. Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad

Author: Choudhury, Sayantan, Tupitsa, Nazarii, Loizou, Nicolas, Horvath, Samuel, Takac, Martin, and Gorbunov, Eduard
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Optimization and Control
Abstract: Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive. This paper introduces a novel optimization algorithm named KATE, which presents a scale-invariant adaptation of the well-known AdaGrad algorithm. We prove the scale-invariance of KATE for the case of Generalized Linear Models. Moreover, for general smooth non-convex problems, we establish a convergence rate of $O \left(\frac{\log T}{\sqrt{T}} \right)$ for KATE, matching the best-known ones for AdaGrad and Adam. We also compare KATE to other state-of-the-art adaptive algorithms Adam and AdaGrad in numerical experiments with different problems, including complex machine learning tasks like image classification and text classification on real data. The results indicate that KATE consistently outperforms AdaGrad and matches/surpasses the performance of Adam in all considered scenarios., Comment: 27 pages, 12 figures
Published: 2024

21. Reinforcement Learning for Solving Stochastic Vehicle Routing Problem with Time Windows

Author: Iklassov, Zangir, Sobirov, Ikboljon, Solozabal, Ruben, and Takac, Martin
Subjects: Computer Science - Artificial Intelligence
Abstract: This paper introduces a reinforcement learning approach to optimize the Stochastic Vehicle Routing Problem with Time Windows (SVRP), focusing on reducing travel costs in goods delivery. We develop a novel SVRP formulation that accounts for uncertain travel costs and demands, alongside specific customer time windows. An attention-based neural network trained through reinforcement learning is employed to minimize routing costs. Our approach addresses a gap in SVRP research, which traditionally relies on heuristic methods, by leveraging machine learning. The model outperforms the Ant-Colony Optimization algorithm, achieving a 1.73% reduction in travel costs. It uniquely integrates external information, demonstrating robustness in diverse environments, making it a valuable benchmark for future SVRP studies and industry application.
Published: 2024

22. AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size

Author: Ostroukhov, Petr, Zhumabayeva, Aigerim, Xiang, Chulu, Gasnikov, Alexander, Takáč, Martin, and Kamzolov, Dmitry
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: This paper presents a novel adaptation of the Stochastic Gradient Descent (SGD), termed AdaBatchGrad. This modification seamlessly integrates an adaptive step size with an adjustable batch size. An increase in batch size and a decrease in step size are well-known techniques to tighten the area of convergence of SGD and decrease its variance. A range of studies by R. Byrd and J. Nocedal introduced various testing techniques to assess the quality of mini-batch gradient approximations and choose the appropriate batch sizes at every step. Methods that utilized exact tests were observed to converge within $O(LR^2/\varepsilon)$ iterations. Conversely, inexact test implementations sometimes resulted in non-convergence and erratic performance. To address these challenges, AdaBatchGrad incorporates both adaptive batch and step sizes, enhancing the method's robustness and stability. For exact tests, our approach converges in $O(LR^2/\varepsilon)$ iterations, analogous to standard gradient descent. For inexact tests, it achieves convergence in $O(\max\lbrace LR^2/\varepsilon, \sigma^2 R^2/\varepsilon^2 \rbrace )$ iterations. This makes AdaBatchGrad markedly more robust and computationally efficient relative to prevailing methods. To substantiate the efficacy of our method, we experimentally show, how the introduction of adaptive step size and adaptive batch size gradually improves the performance of regular SGD. The results imply that AdaBatchGrad surpasses alternative methods, especially when applied to inexact tests.
Published: 2024

23. Federated Learning Can Find Friends That Are Advantageous

Author: Tupitsa, Nazarii, Horváth, Samuel, Takáč, Martin, and Gorbunov, Eduard
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: In Federated Learning (FL), the distributed nature and heterogeneity of client data present both opportunities and challenges. While collaboration among clients can significantly enhance the learning process, not all collaborations are beneficial; some may even be detrimental. In this study, we introduce a novel algorithm that assigns adaptive aggregation weights to clients participating in FL training, identifying those with data distributions most conducive to a specific learning objective. We demonstrate that our aggregation method converges no worse than the method that aggregates only the updates received from clients with the same data distribution. Furthermore, empirical evaluations consistently reveal that collaborations guided by our algorithm outperform traditional FL approaches. This underscores the critical role of judicious client selection and lays the foundation for more streamlined and effective FL implementations in the coming years.
Published: 2024

24. SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms

Author: Abdukhakimov, Farshed, Xiang, Chulu, Kamzolov, Dmitry, Gower, Robert, and Takáč, Martin
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: Adaptive optimization methods are widely recognized as among the most popular approaches for training Deep Neural Networks (DNNs). Techniques such as Adam, AdaGrad, and AdaHessian utilize a preconditioner that modifies the search direction by incorporating information about the curvature of the objective function. However, despite their adaptive characteristics, these methods still require manual fine-tuning of the step-size. This, in turn, impacts the time required to solve a particular problem. This paper presents an optimization framework named SANIA to tackle these challenges. Beyond eliminating the need for manual step-size hyperparameter settings, SANIA incorporates techniques to address poorly scaled or ill-conditioned problems. We also explore several preconditioning methods, including Hutchinson's method, which approximates the Hessian diagonal of the loss function. We conclude with an extensive empirical examination of the proposed techniques across classification tasks, covering both convex and non-convex contexts.
Published: 2023

25. A Generalization of the Sugeno integral to aggregate Interval-valued data: an application to Brain Computer Interface and Social Network Analysis

Author: Fumanal-Idocin, Javier, Takac, Zdenko, Horanska, Lubomira, Asmus, Thiago da Cruz, Vidaurre, Carmen, Dimuro, Graçaliz, Fernandez, Javier, and Bustince, Humberto
Subjects: Computer Science - Social and Information Networks
Abstract: Intervals are a popular way to represent the uncertainty related to data, in which we express the vagueness of each observation as the width of the interval. However, when using intervals for this purpose, we need to use the appropriate set of mathematical tools to work with. This can be problematic due to the scarcity and complexity of interval-valued functions in comparison with the numerical ones. In this work, we propose to extend a generalization of the Sugeno integral to work with interval-valued data. Then, we use this integral to aggregate interval-valued data in two different settings: first, we study the use of intervals in a brain-computer interface; secondly, we study how to construct interval-valued relationships in a social network, and how to aggregate their information. Our results show that interval-valued data can effectively model some of the uncertainty and coalitions of the data in both cases. For the case of brain-computer interface, we found that our results surpassed the results of other interval-valued functions.
Published: 2023
Full Text: View/download PDF

26. Efficient Conformal Prediction under Data Heterogeneity

Author: Plassier, Vincent, Kotelevskii, Nikita, Rubashevskii, Aleksandr, Noskov, Fedor, Velikanov, Maksim, Fishkov, Alexander, Horvath, Samuel, Takac, Martin, Moulines, Eric, and Panov, Maxim
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification, which is crucial for ensuring the reliability of predictions. However, common CP methods heavily rely on data exchangeability, a condition often violated in practice. Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples. This work introduces a new efficient approach to CP that produces provably valid confidence sets for fairly general non-exchangeable data distributions. We illustrate the general theory with applications to the challenging setting of federated learning under data heterogeneity between agents. Our method allows constructing provably valid personalized prediction sets for agents in a fully federated way. The effectiveness of the proposed method is demonstrated in a series of experiments on real-world datasets., Comment: 29 pages
Published: 2023

27. Dirichlet-based Uncertainty Quantification for Personalized Federated Learning with Improved Posterior Networks

Author: Kotelevskii, Nikita, Horváth, Samuel, Nandakumar, Karthik, Takáč, Martin, and Panov, Maxim
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: In modern federated learning, one of the main challenges is to account for inherent heterogeneity and the diverse nature of data distributions for different clients. This problem is often addressed by introducing personalization of the models towards the data distribution of the particular client. However, a personalized model might be unreliable when applied to the data that is not typical for this client. Eventually, it may perform worse for these data than the non-personalized global model trained in a federated way on the data from all the clients. This paper presents a new approach to federated learning that allows selecting a model from global and personalized ones that would perform better for a particular input point. It is achieved through a careful modeling of predictive uncertainties that helps to detect local and global in- and out-of-distribution data and use this information to select the model that is confident in a prediction. The comprehensive experimental evaluation on the popular real-world image datasets shows the superior performance of the model in the presence of out-of-distribution data while performing on par with state-of-the-art personalized federated learning algorithms in the standard scenarios.
Published: 2023

28. Reinforcement Learning for Solving Stochastic Vehicle Routing Problem

Author: Iklassov, Zangir, Sobirov, Ikboljon, Solozabal, Ruben, and Takac, Martin
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science, Computer Science - Machine Learning
Abstract: This study addresses a gap in the utilization of Reinforcement Learning (RL) and Machine Learning (ML) techniques in solving the Stochastic Vehicle Routing Problem (SVRP) that involves the challenging task of optimizing vehicle routes under uncertain conditions. We propose a novel end-to-end framework that comprehensively addresses the key sources of stochasticity in SVRP and utilizes an RL agent with a simple yet effective architecture and a tailored training method. Through comparative analysis, our proposed model demonstrates superior performance compared to a widely adopted state-of-the-art metaheuristic, achieving a significant 3.43% reduction in travel costs. Furthermore, the model exhibits robustness across diverse SVRP settings, highlighting its adaptability and ability to learn optimal routing strategies in varying environments. The publicly available implementation of our framework serves as a valuable resource for future research endeavors aimed at advancing RL-based solutions for SVRP., Comment: 14 pages, accepted to ACML24
Published: 2023

29. Byzantine-Tolerant Methods for Distributed Variational Inequalities

Author: Tupitsa, Nazarii, Almansoori, Abdulla Jasem, Wu, Yanlin, Takáč, Martin, Nandakumar, Karthik, Horváth, Samuel, and Gorbunov, Eduard
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: Robustness to Byzantine attacks is a necessity for various distributed training scenarios. When the training reduces to the process of solving a minimization problem, Byzantine robustness is relatively well-understood. However, other problem formulations, such as min-max problems or, more generally, variational inequalities, arise in many modern machine learning and, in particular, distributed learning tasks. These problems significantly differ from the standard minimization ones and, therefore, require separate consideration. Nevertheless, only one work (Adibi et al., 2022) addresses this important question in the context of Byzantine robustness. Our work makes a further step in this direction by providing several (provably) Byzantine-robust methods for distributed variational inequality, thoroughly studying their theoretical convergence, removing the limitations of the previous work, and providing numerical comparisons supporting the theoretical findings., Comment: NeurIPS 2023; 69 pages, 12 figures
Published: 2023

30. Stochastic Gradient Descent with Preconditioned Polyak Step-size

Author: Abdukhakimov, Farshed, Xiang, Chulu, Kamzolov, Dmitry, and Takáč, Martin
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: Stochastic Gradient Descent (SGD) is one of the many iterative optimization methods that are widely used in solving machine learning problems. These methods display valuable properties and attract researchers and industrial machine learning engineers with their simplicity. However, one of the weaknesses of this type of methods is the necessity to tune learning rate (step-size) for every loss function and dataset combination to solve an optimization problem and get an efficient performance in a given time budget. Stochastic Gradient Descent with Polyak Step-size (SPS) is a method that offers an update rule that alleviates the need of fine-tuning the learning rate of an optimizer. In this paper, we propose an extension of SPS that employs preconditioning techniques, such as Hutchinson's method, Adam, and AdaGrad, to improve its performance on badly scaled and/or ill-conditioned datasets.
Published: 2023

31. Advancing the lower bounds: An accelerated, stochastic, second-order method with optimal adaptation to inexactness

Author: Agafonov, Artem, Kamzolov, Dmitry, Gasnikov, Alexander, Kavis, Ali, Antonakopoulos, Kimon, Cevher, Volkan, and Takáč, Martin
Subjects: Mathematics - Optimization and Control
Abstract: We present a new accelerated stochastic second-order method that is robust to both gradient and Hessian inexactness, which occurs typically in machine learning. We establish theoretical lower bounds and prove that our algorithm achieves optimal convergence in both gradient and Hessian inexactness in this key setting. We further introduce a tensor generalization for stochastic higher-order derivatives. When the oracles are non-stochastic, the proposed tensor algorithm matches the global convergence of Nesterov Accelerated Tensor method. Both algorithms allow for approximate solutions of their auxiliary subproblems with verifiable conditions on the accuracy of the solution.
Published: 2023

32. Convergence analysis of stochastic gradient descent with adaptive preconditioning for non-convex and convex functions

Author: Pasechnyuk, Dmitrii A., Gasnikov, Alexander, and Takáč, Martin
Subjects: Mathematics - Optimization and Control, 65K10, G.1.6
Abstract: Preconditioning is a crucial operation in gradient-based numerical optimisation. It helps decrease the local condition number of a function by appropriately transforming its gradient. For a convex function, where the gradient can be computed exactly, the optimal linear transformation corresponds to the inverse of the Hessian operator, while the optimal convex transformation is the convex conjugate of the function. Different conditions result in variations of these dependencies. Practical algorithms often employ low-rank or stochastic approximations of the inverse Hessian matrix for preconditioning. However, theoretical guarantees for these algorithms typically lack a justification for the defining property of preconditioning. This paper presents a simple theoretical framework that demonstrates, given a smooth function and an available unbiased stochastic approximation of its gradient, that it is possible to refine the dependency of the convergence rate on the Lipschitz constant of the gradient., Comment: 12 pages, 2 figures
Published: 2023

33. Typical Lipschitz images of rectifiable metric spaces

Author: Bate, David and Takáč, Jakub
Subjects: Mathematics - Metric Geometry, Mathematics - Functional Analysis, 30L99 (Primary), 28A78 (Secondary)
Abstract: This article studies typical 1-Lipschitz images of $n$-rectifiable metric spaces $E$ into $\mathbb{R}^m$ for $m\geq n$. For example, if $E\subset \mathbb{R}^k$, we show that the Jacobian of such a typical 1-Lipschitz map equals 1 $\mathcal{H}^n$-almost everywhere and, if $m>n$, preserves the Hausdorff measure of $E$. In general, we provide sufficient conditions, in terms of the tangent norms of $E$, for when a typical 1-Lipschitz map preserves the Hausdorff measure of $E$, up to some constant multiple. Almost optimal results for strongly $n$-rectifiable metric spaces are obtained. On the other hand, for any norm $|\cdot|$ on $\mathbb{R}^m$, we show that, in the space of 1-Lipschitz functions from $([-1,1]^n,|\cdot|_\infty)$ to $(\mathbb{R}^m,|\cdot|)$, the $\mathcal{H}^n$-measure of a typical image is not bounded below by any $\Delta>0$.
Published: 2023
Full Text: View/download PDF

34. Stochastic Gradient Descent with Preconditioned Polyak Step-Size

Author: Abdukhakimov, F., Xiang, C., Kamzolov, D., and Takáč, M.
Published: 2024
Full Text: View/download PDF

35. Random-reshuffled SARAH does not need full gradient computations

Author: Beznosikov, Aleksandr and Takáč, Martin
Published: 2024
Full Text: View/download PDF

36. Stochastic Gradient Methods with Preconditioned Updates

Author: Sadiev, Abdurakhmon, Beznosikov, Aleksandr, Almansoori, Abdulla Jasem, Kamzolov, Dmitry, Tappenden, Rachael, and Takáč, Martin
Published: 2024
Full Text: View/download PDF

37. Hybrid Methods in Polynomial Optimisation

Author: Aspman, Johannes, Bareilles, Gilles, Kungurtsev, Vyacheslav, Marecek, Jakub, and Takáč, Martin
Subjects: Mathematics - Optimization and Control, 90C22, 90C23
Abstract: The Moment/Sum-of-squares hierarchy provides a way to compute the global minimizers of polynomial optimization problems (POP), at the cost of solving a sequence of increasingly large semidefinite programs (SDPs). We consider large-scale POPs, for which interior-point methods are no longer able to solve the resulting SDPs. We propose an algorithm that combines a first-order method for solving the SDP relaxation, and a second-order method on a non-convex problem obtained from the POP. The switch from the first to the second-order method is based on a quantitative criterion, whose satisfaction ensures that Newton's method converges quadratically from its first iteration. This criterion leverages the point-estimation theory of Smale and the active-set identification. We illustrate the methodology to obtain global minimizers of large-scale optimal power flow problems.
Published: 2023

38. WikiGoldSK: Annotated Dataset, Baselines and Few-Shot Learning Experiments for Slovak Named Entity Recognition

Author: Šuba, Dávid, Šuppa, Marek, Kubík, Jozef, Hamerlik, Endre, and Takáč, Martin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Named Entity Recognition (NER) is a fundamental NLP tasks with a wide range of practical applications. The performance of state-of-the-art NER methods depends on high quality manually anotated datasets which still do not exist for some languages. In this work we aim to remedy this situation in Slovak by introducing WikiGoldSK, the first sizable human labelled Slovak NER dataset. We benchmark it by evaluating state-of-the-art multilingual Pretrained Language Models and comparing it to the existing silver-standard Slovak NER dataset. We also conduct few-shot experiments and show that training on a sliver-standard dataset yields better results. To enable future work that can be based on Slovak NER, we release the dataset, code, as well as the trained models publicly under permissible licensing terms at https://github.com/NaiveNeuron/WikiGoldSK., Comment: BSNLP 2023 Workshop at EACL 2023
Published: 2023

39. Regularization of the policy updates for stabilizing Mean Field Games

Author: Algumaei, Talal, Solozabal, Ruben, Alami, Reda, Hacid, Hakim, Debbah, Merouane, and Takac, Martin
Subjects: Computer Science - Artificial Intelligence
Abstract: This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.
Published: 2023

40. MAHTM: A Multi-Agent Framework for Hierarchical Transactive Microgrids

Author: Cuadrado, Nicolas, Gutierrez, Roberto, Zhu, Yongli, and Takac, Martin
Subjects: Computer Science - Machine Learning, Computer Science - Multiagent Systems, Electrical Engineering and Systems Science - Systems and Control, I.2.8
Abstract: Integrating variable renewable energy into the grid has posed challenges to system operators in achieving optimal trade-offs among energy availability, cost affordability, and pollution controllability. This paper proposes a multi-agent reinforcement learning framework for managing energy transactions in microgrids. The framework addresses the challenges above: it seeks to optimize the usage of available resources by minimizing the carbon footprint while benefiting all stakeholders. The proposed architecture consists of three layers of agents, each pursuing different objectives. The first layer, comprised of prosumers and consumers, minimizes the total energy cost. The other two layers control the energy price to decrease the carbon impact while balancing the consumption and production of both renewable and conventional energy. This framework also takes into account fluctuations in energy demand and supply., Comment: ICLR 2023 Workshop: Tackling Climate Change with Machine Learning
Published: 2023

41. Preconditioning meets biased compression for efficient distributed optimization

Author: Pirau, Vitali, Beznosikov, Aleksandr, Takáč, Martin, Matyukhin, Vladislav, and Gasnikov, Alexander
Published: 2024
Full Text: View/download PDF

42. Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities

Author: Beznosikov, Aleksandr, Takáč, Martin, and Gasnikov, Alexander
Subjects: Mathematics - Optimization and Control, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Variational inequalities are a broad and flexible class of problems that includes minimization, saddle point, and fixed point problems as special cases. Therefore, variational inequalities are used in various applications ranging from equilibrium search to adversarial learning. With the increasing size of data and models, today's instances demand parallel and distributed computing for real-world machine learning problems, most of which can be represented as variational inequalities. Meanwhile, most distributed approaches have a significant bottleneck - the cost of communications. The three main techniques to reduce the total number of communication rounds and the cost of one such round are the similarity of local functions, compression of transmitted information, and local updates. In this paper, we combine all these approaches. Such a triple synergy did not exist before for variational inequalities and saddle problems, nor even for minimization problems. The methods presented in this paper have the best theoretical guarantees of communication complexity and are significantly ahead of other methods for distributed variational inequalities. The theoretical results are confirmed by adversarial learning experiments on synthetic and real datasets., Comment: Appears in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023) (https://proceedings.neurips.cc/paper_files/paper/2023/hash/5b4a459db23e6db9be2a128380953d96-Abstract-Conference.html). 36 pages, 3 algorithms, 1 figure, 1 table
Published: 2023

43. Cubic Regularization is the Key! The First Accelerated Quasi-Newton Method with a Global Convergence Rate of $O(k^{-2})$ for Convex Functions

Author: Kamzolov, Dmitry, Ziu, Klea, Agafonov, Artem, and Takáč, Martin
Subjects: Mathematics - Optimization and Control
Abstract: In this paper, we propose the first Quasi-Newton method with a global convergence rate of $O(k^{-1})$ for general convex functions. Quasi-Newton methods, such as BFGS, SR-1, are well-known for their impressive practical performance. However, they may be slower than gradient descent for general convex functions, with the best theoretical rate of $O(k^{-1/3})$. This gap between impressive practical performance and poor theoretical guarantees was an open question for a long period of time. In this paper, we make a significant step to close this gap. We improve upon the existing rate and propose the Cubic Regularized Quasi-Newton Method with a convergence rate of $O(k^{-1})$. The key to achieving this improvement is to use the Cubic Regularized Newton Method over the Damped Newton Method as an outer method, where the Quasi-Newton update is an inexact Hessian approximation. Using this approach, we propose the first Accelerated Quasi-Newton method with a global convergence rate of $O(k^{-2})$ for general convex functions. In special cases where we can improve the precision of the approximation, we achieve a global convergence rate of $O(k^{-3})$, which is faster than any first-order method. To make these methods practical, we introduce the Adaptive Inexact Cubic Regularized Newton Method and its accelerated version, which provide real-time control of the approximation error. We show that the proposed methods have impressive practical performance and outperform both first and second-order methods.
Published: 2023

44. Learning Confident Classifiers in the Presence of Label Noise

Author: Hashmi, Asma Ahmed, Zhumabayeva, Aigerim, Kotelevskii, Nikita, Agafonov, Artem, Yaqub, Mohammad, Panov, Maxim, and Takáč, Martin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: The success of Deep Neural Network (DNN) models significantly depends on the quality of provided annotations. In medical image segmentation, for example, having multiple expert annotations for each data point is common to minimize subjective annotation bias. Then, the goal of estimation is to filter out the label noise and recover the ground-truth masks, which are not explicitly given. This paper proposes a probabilistic model for noisy observations that allows us to build a confident classification and segmentation models. To accomplish it, we explicitly model label noise and introduce a new information-based regularization that pushes the network to recover the ground-truth labels. In addition, for segmentation task we adjust the loss function by prioritizing learning in high-confidence regions where all the annotators agree on labeling. We evaluate the proposed method on a series of classification tasks such as noisy versions of MNIST, CIFAR-10, Fashion-MNIST datasets as well as CIFAR-10N, which is real-world dataset with noisy human annotations. Additionally, for segmentation task, we consider several medical imaging datasets, such as, LIDC and RIGA that reflect real-world inter-variability among multiple annotators. Our experiments show that our algorithm outperforms state-of-the-art solutions for the considered classification and segmentation problems.
Published: 2023

45. Degenerate elliptic equations in ordered Banach spaces and applications

Author: Takáč, P., primary
Published: 2024
Full Text: View/download PDF

46. PaDPaF: Partial Disentanglement with Partially-Federated GANs

Author: Almansoori, Abdulla Jasem, Horváth, Samuel, and Takáč, Martin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Federated learning has become a popular machine learning paradigm with many potential real-life applications, including recommendation systems, the Internet of Things (IoT), healthcare, and self-driving cars. Though most current applications focus on classification-based tasks, learning personalized generative models remains largely unexplored, and their benefits in the heterogeneous setting still need to be better understood. This work proposes a novel architecture combining global client-agnostic and local client-specific generative models. We show that using standard techniques for training federated models, our proposed model achieves privacy and personalization by implicitly disentangling the globally consistent representation (i.e. content) from the client-dependent variations (i.e. style). Using such decomposition, personalized models can generate locally unseen labels while preserving the given style of the client and can predict the labels for all clients with high accuracy by training a simple linear classifier on the global content features. Furthermore, disentanglement enables other essential applications, such as data anonymization, by sharing only the content. Extensive experimental evaluation corroborates our findings, and we also discuss a theoretical motivation for the proposed approach., Comment: 29 pages, 21 figures. Published at TMLR 04/2024
Published: 2022

47. Optimal Power Flow Pursuit in the Alternating Current Model

Author: Liu, Jie, Bellon, Antonio, Simonetto, Andrea, Takac, Martin, and Marecek, Jakub
Subjects: Mathematics - Optimization and Control, Electrical Engineering and Systems Science - Systems and Control
Abstract: Transmission-constrained problems in power systems can be cast as polynomial optimization problems whose coefficients vary over time. We consider the complications therein and suggest several approaches. On the example of the alternating-current optimal power flows (ACOPFs), we illustrate one of the approaches in detail. For the time-varying ACOPF, we provide an upper bound for the difference between the optimal cost for a relaxation using the most recent data and the current approximate optimal cost generated by our algorithm. This bound is a function of the properties of the instance and the rate of change of the coefficients over time. Moreover, we also bound the number of floating-point operations to perform between two subsequent updates to ensure a bounded error., Comment: A journal version of Liu et al [arXiv:1710.07119, PSCC 2018] taking into account our recent work [arXiv:2104.05445 and arXiv:2210.08387]
Published: 2022

48. Gradient Descent and the Power Method: Exploiting their connection to find the leftmost eigen-pair and escape saddle points

Author: Tappenden, Rachael and Takáč, Martin
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: This work shows that applying Gradient Descent (GD) with a fixed step size to minimize a (possibly nonconvex) quadratic function is equivalent to running the Power Method (PM) on the gradients. The connection between GD with a fixed step size and the PM, both with and without fixed momentum, is thus established. Consequently, valuable eigen-information is available via GD. Recent examples show that GD with a fixed step size, applied to locally quadratic nonconvex functions, can take exponential time to escape saddle points (Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Aarti Singh, and Barnabas Poczos: "Gradient descent can take exponential time to escape saddle points"; S. Paternain, A. Mokhtari, and A. Ribeiro: "A newton-based method for nonconvex optimization with fast evasion of saddle points"). Here, those examples are revisited and it is shown that eigenvalue information was missing, so that the examples may not provide a complete picture of the potential practical behaviour of GD. Thus, ongoing investigation of the behaviour of GD on nonconvex functions, possibly with an \emph{adaptive} or \emph{variable} step size, is warranted. It is shown that, in the special case of a quadratic in $R^2$, if an eigenvalue is known, then GD with a fixed step size will converge in two iterations, and a complete eigen-decomposition is available. By considering the dynamics of the gradients and iterates, new step size strategies are proposed to improve the practical performance of GD. Several numerical examples are presented, which demonstrate the advantages of exploiting the GD--PM connection.
Published: 2022

49. A Damped Newton Method Achieves Global $O\left(\frac{1}{k^2}\right)$ and Local Quadratic Convergence Rate

Author: Hanzely, Slavomír, Kamzolov, Dmitry, Pasechnyuk, Dmitry, Gasnikov, Alexander, Richtárik, Peter, and Takáč, Martin
Subjects: Mathematics - Optimization and Control
Abstract: In this paper, we present the first stepsize schedule for Newton method resulting in fast global and local convergence guarantees. In particular, a) we prove an $O\left( \frac 1 {k^2} \right)$ global rate, which matches the state-of-the-art global rate of cubically regularized Newton method of Polyak and Nesterov (2006) and of regularized Newton method of Mishchenko (2021) and Doikov and Nesterov (2021), b) we prove a local quadratic rate, which matches the best-known local rate of second-order methods, and c) our stepsize formula is simple, explicit, and does not require solving any subproblem. Our convergence proofs hold under affine-invariance assumptions closely related to the notion of self-concordance. Finally, our method has competitive performance when compared to existing baselines, which share the same fast global convergence guarantees.
Published: 2022

50. Effects of momentum scaling for SGD

Author: Pasechnyuk, Dmitry A., Gasnikov, Alexander, and Takáč, Martin
Subjects: Mathematics - Optimization and Control, 90C25 (Primary), 65K05 (Secondary), G.1.6
Abstract: The paper studies the properties of stochastic gradient methods with preconditioning. We focus on momentum updated preconditioners with momentum coefficient $\beta$. Seeking to explain practical efficiency of scaled methods, we provide convergence analysis in a norm associated with preconditioner, and demonstrate that scaling allows one to get rid of gradients Lipschitz constant in convergence rates. Along the way, we emphasize important role of $\beta$, undeservedly set to constant $0.99...9$ at the arbitrariness of various authors. Finally, we propose the explicit constructive formulas for adaptive $\beta$ and step size values., Comment: 19 pages, 14 figures
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

423 results on '"Takáč P"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources