Author: "Chen, Ziyi" / Topic: fos: computer and information sciences and fos: mathematics - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Chen, Ziyi"' showing total 7 results

Start Over Author "Chen, Ziyi" Topic fos: computer and information sciences Topic fos: mathematics

7 results on '"Chen, Ziyi"'

1. Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex Minimax Machine Learning

Author: Chen, Ziyi, Ma, Shaocong, and Zhou, Yi
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), MathematicsofComputing_NUMERICALANALYSIS, FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Alternating gradient-descent-ascent (AltGDA) is an optimization algorithm that has been widely used for model training in various machine learning applications, which aims to solve a nonconvex minimax optimization problem. However, the existing studies show that it suffers from a high computation complexity in nonconvex minimax optimization. In this paper, we develop a single-loop and fast AltGDA-type algorithm that leverages proximal gradient updates and momentum acceleration to solve regularized nonconvex minimax optimization problems. By leveraging the momentum acceleration technique, we prove that the algorithm converges to a critical point in nonconvex minimax optimization and achieves a computation complexity in the order of $\mathcal{O}(\kappa^{\frac{11}{6}}\epsilon^{-2})$, where $\epsilon$ is the desired level of accuracy and $\kappa$ is the problem's condition number. {Such a computation complexity improves the state-of-the-art complexities of single-loop GDA and AltGDA algorithms (see the summary of comparison in \Cref{table1})}. We demonstrate the effectiveness of our algorithm via an experiment on adversarial deep learning., Comment: 12 pages, 1 figure. Added acknowledgement of NSF funding. arXiv admin note: text overlap with arXiv:2102.04653
Published: 2022

2. Data Sampling Affects the Complexity of Online SGD over Dependent Data

Author: Ma, Shaocong, Chen, Ziyi, Zhou, Yi, Ji, Kaiyi, and Liang, Yingbin
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Conventional machine learning applications typically assume that data samples are independently and identically distributed (i.i.d.). However, practical scenarios often involve a data-generating process that produces highly dependent data samples, which are known to heavily bias the stochastic optimization process and slow down the convergence of learning. In this paper, we conduct a fundamental study on how different stochastic data sampling schemes affect the sample complexity of online stochastic gradient descent (SGD) over highly dependent data. Specifically, with a $\phi$-mixing model of data dependence, we show that online SGD with proper periodic data-subsampling achieves an improved sample complexity over the standard online SGD in the full spectrum of the data dependence level. Interestingly, even subsampling a subset of data samples can accelerate the convergence of online SGD over highly dependent data. Moreover, we show that online SGD with mini-batch sampling can further substantially improve the sample complexity over online SGD with periodic data-subsampling over highly dependent data. Numerical experiments validate our theoretical results.
Published: 2022

3. A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and Nonsmooth Bi-level Optimization

Author: Chen, Ziyi, Kailkhura, Bhavya, and Zhou, Yi
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), MathematicsofComputing_NUMERICALANALYSIS, Mathematics::Optimization and Control, FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Many important machine learning applications involve regularized nonconvex bi-level optimization. However, the existing gradient-based bi-level optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and they suffer from a high computation complexity in nonconvex bi-level optimization. In this work, we study a proximal gradient-type algorithm that adopts the approximate implicit differentiation (AID) scheme for nonconvex bi-level optimization with possibly nonconvex and nonsmooth regularizers. In particular, the algorithm applies the Nesterov's momentum to accelerate the computation of the implicit gradient involved in AID. We provide a comprehensive analysis of the global convergence properties of this algorithm through identifying its intrinsic potential function. In particular, we formally establish the convergence of the model parameters to a critical point of the bi-level problem, and obtain an improved computation complexity $\mathcal{O}(\kappa^{3.5}\epsilon^{-2})$ over the state-of-the-art result. Moreover, we analyze the asymptotic convergence rates of this algorithm under a class of local nonconvex geometries characterized by a {\L}ojasiewicz-type gradient inequality. Experiment on hyper-parameter optimization demonstrates the effectiveness of our algorithm., Comment: 20 pages, 1 figure, 1 table
Published: 2022
Full Text: View/download PDF

4. A Cubic Regularization Approach for Finding Local Minimax Points in Nonconvex Minimax Optimization

Author: Chen, Ziyi, Hu, Zhengyang, Li, Qunwei, Wang, Zhe, and Zhou, Yi
Subjects: FOS: Computer and information sciences, Statistics::Theory, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Gradient descent-ascent (GDA) is a widely used algorithm for minimax optimization. However, GDA has been proved to converge to stationary points for nonconvex minimax optimization, which are suboptimal compared with local minimax points. In this work, we develop cubic regularization (CR) type algorithms that globally converge to local minimax points in nonconvex-strongly-concave minimax optimization. We first show that local minimax points are equivalent to second-order stationary points of a certain envelope function. Then, inspired by the classic cubic regularization algorithm, we propose an algorithm named Cubic-LocalMinimax for finding local minimax points, and provide a comprehensive convergence analysis by leveraging its intrinsic potential function. Specifically, we establish the global convergence of Cubic-LocalMinimax to a local minimax point at a sublinear convergence rate and characterize its iteration complexity. Also, we propose a GDA-based solver for solving the cubic subproblem involved in Cubic-LocalMinimax up to certain pre-defined accuracy, and analyze the overall gradient and Hessian-vector product computation complexities of such an inexact Cubic-LocalMinimax algorithm. Moreover, we propose a stochastic variant of Cubic-LocalMinimax for large-scale minimax optimization, and characterize its sample complexity under stochastic sub-sampling. Experimental results demonstrate faster convergence of our stochastic Cubic-LocalMinimax than some existing algorithms., 43 pages, 9 figures. Text overlap with arXiv:2102.04653. The original title is "Escaping Saddle Points in Nonconvex Minimax Optimization via Cubic-Regularized Gradient Descent-Ascent"
Published: 2021

5. Multi-Agent Off-Policy TD Learning: Finite-Time Analysis with Near-Optimal Sample Complexity and Communication Complexity

Author: Chen, Ziyi, Zhou, Yi, and Chen, Rongrong
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Computer Science - Multiagent Systems, Mathematics - Optimization and Control, Machine Learning (cs.LG), Multiagent Systems (cs.MA)
Abstract: The finite-time convergence of off-policy TD learning has been comprehensively studied recently. However, such a type of convergence has not been well established for off-policy TD learning in the multi-agent setting, which covers broader applications and is fundamentally more challenging. This work develops two decentralized TD with correction (TDC) algorithms for multi-agent off-policy TD learning under Markovian sampling. In particular, our algorithms preserve full privacy of the actions, policies and rewards of the agents, and adopt mini-batch sampling to reduce the sampling variance and communication frequency. Under Markovian sampling and linear function approximation, we proved that the finite-time sample complexity of both algorithms for achieving an $\epsilon$-accurate solution is in the order of $\mathcal{O}(\epsilon^{-1}\ln \epsilon^{-1})$, matching the near-optimal sample complexity of centralized TD(0) and TDC. Importantly, the communication complexity of our algorithms is in the order of $\mathcal{O}(\ln \epsilon^{-1})$, which is significantly lower than the communication complexity $\mathcal{O}(\epsilon^{-1}\ln \epsilon^{-1})$ of the existing decentralized TD(0). Experiments corroborate our theoretical findings., Comment: 34 pages, 3 figures
Published: 2021
Full Text: View/download PDF

6. Proximal Gradient Descent-Ascent: Variable Convergence under K�� Geometry

Author: Chen, Ziyi, Zhou, Yi, Xu, Tengyu, and Liang, Yingbin
Subjects: FOS: Computer and information sciences, Optimization and Control (math.OC), FOS: Mathematics, Machine Learning (cs.LG)
Abstract: The gradient descent-ascent (GDA) algorithm has been widely applied to solve minimax optimization problems. In order to achieve convergent policy parameters for minimax optimization, it is important that GDA generates convergent variable sequences rather than convergent sequences of function values or gradient norms. However, the variable convergence of GDA has been proved only under convexity geometries, and there lacks understanding for general nonconvex minimax optimization. This paper fills such a gap by studying the convergence of a more general proximal-GDA for regularized nonconvex-strongly-concave minimax optimization. Specifically, we show that proximal-GDA admits a novel Lyapunov function, which monotonically decreases in the minimax optimization process and drives the variable sequence to a critical point. By leveraging this Lyapunov function and the K�� geometry that parameterizes the local geometries of general nonconvex functions, we formally establish the variable convergence of proximal-GDA to a critical point $x^*$, i.e., $x_t\to x^*, y_t\to y^*(x^*)$. Furthermore, over the full spectrum of the K��-parameterized geometry, we show that proximal-GDA achieves different types of convergence rates ranging from sublinear convergence up to finite-step convergence, depending on the geometry associated with the K�� parameter. This is the first theoretical result on the variable convergence for nonconvex minimax optimization., To appear in ICLR 2021
Published: 2021
Full Text: View/download PDF

7. Momentum with Variance Reduction for Nonconvex Composition Optimization

Author: Chen, Ziyi and Zhou, Yi
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Composition optimization is widely-applied in nonconvex machine learning. Various advanced stochastic algorithms that adopt momentum and variance reduction techniques have been developed for composition optimization. However, these algorithms do not fully exploit both techniques to accelerate the convergence and are lack of convergence guarantee in nonconvex optimization. This paper complements the existing literature by developing various momentum schemes with SPIDER-based variance reduction for non-convex composition optimization. In particular, our momentum design requires less number of proximal mapping evaluations per-iteration than that required by the existing Katyusha momentum. Furthermore, our algorithm achieves near-optimal sample complexity results in both non-convex finite-sum and online composition optimization and achieves a linear convergence rate under the gradient dominant condition. Numerical experiments demonstrate that our algorithm converges significantly faster than existing algorithms in nonconvex composition optimization., Comment: 36 pages, 1 figure
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"Chen, Ziyi"'

1. Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex Minimax Machine Learning

2. Data Sampling Affects the Complexity of Online SGD over Dependent Data

3. A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and Nonsmooth Bi-level Optimization

4. A Cubic Regularization Approach for Finding Local Minimax Points in Nonconvex Minimax Optimization

5. Multi-Agent Off-Policy TD Learning: Finite-Time Analysis with Near-Optimal Sample Complexity and Communication Complexity

6. Proximal Gradient Descent-Ascent: Variable Convergence under K�� Geometry

7. Momentum with Variance Reduction for Nonconvex Composition Optimization

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

7 results on '"Chen, Ziyi"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources