Author: "Shi, Hao-Jun Michael" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Shi, Hao-Jun Michael"' showing total 19 results

Start Over Author "Shi, Hao-Jun Michael"

19 results on '"Shi, Hao-Jun Michael"'

1. A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Author: Shi, Hao-Jun Michael, Lee, Tsung-Hsien, Iwasaki, Shintaro, Gallego-Posada, Jose, Li, Zhijing, Rangadurai, Kaushik, Mudigere, Dheevatsa, and Rabbat, Michael
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Mathematical Software, Mathematics - Optimization and Control
Abstract: Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the performance optimizations that our implementation leverages to train deep networks at-scale in PyTorch. Our implementation enables fast multi-GPU distributed data-parallel training by distributing the memory and computation associated with blocks of each parameter via PyTorch's DTensor data structure and performing an AllGather primitive on the computed search directions at each iteration. This major performance enhancement enables us to achieve at most a 10% performance reduction in per-step wall-clock time compared against standard diagonal-scaling-based adaptive gradient methods. We validate our implementation by performing an ablation study on training ImageNet ResNet50, demonstrating Shampoo's superiority over standard training recipes with minimal hyperparameter tuning., Comment: 38 pages, 8 figures, 5 tables
Published: 2023

2. Adaptive Finite-Difference Interval Estimation for Noisy Derivative-Free Optimization

Author: Shi, Hao-Jun Michael, Xie, Yuchen, Xuan, Melody Qiming, and Nocedal, Jorge
Subjects: Mathematics - Optimization and Control
Abstract: A common approach for minimizing a smooth nonlinear function is to employ finite-difference approximations to the gradient. While this can be easily performed when no error is present within the function evaluations, when the function is noisy, the optimal choice requires information about the noise level and higher-order derivatives of the function, which is often unavailable. Given the noise level of the function, we propose a bisection search for finding a finite-difference interval for any finite-difference scheme that balances the truncation error, which arises from the error in the Taylor series approximation, and the measurement error, which results from noise in the function evaluation. Our procedure produces reliable estimates of the finite-difference interval at low cost without explicitly approximating higher-order derivatives. We show its numerical reliability and accuracy on a set of test problems. When combined with L-BFGS, we obtain a robust method for minimizing noisy black-box functions, as illustrated on a subset of unconstrained CUTEst problems with synthetically added noise., Comment: 39 pages, 20 tables, 6 figures
Published: 2021

3. On the Numerical Performance of Derivative-Free Optimization Methods Based on Finite-Difference Approximations

Author: Shi, Hao-Jun Michael, Xuan, Melody Qiming, Oztoprak, Figen, and Nocedal, Jorge
Subjects: Mathematics - Optimization and Control
Abstract: The goal of this paper is to investigate an approach for derivative-free optimization that has not received sufficient attention in the literature and is yet one of the simplest to implement and parallelize. It consists of computing gradients of a smoothed approximation of the objective function (and constraints), and employing them within established codes. These gradient approximations are calculated by finite differences, with a differencing interval determined by the noise level in the functions and a bound on the second or third derivatives. It is assumed that noise level is known or can be estimated by means of difference tables or sampling. The use of finite differences has been largely dismissed in the derivative-free optimization literature as too expensive in terms of function evaluations and/or as impractical when the objective function contains noise. The test results presented in this paper suggest that such views should be re-examined and that the finite-difference approach has much to be recommended. The tests compared NEWUOA, DFO-LS and COBYLA against the finite-difference approach on three classes of problems: general unconstrained problems, nonlinear least squares, and general nonlinear programs with equality constraints., Comment: 82 pages, 38 tables, 29 figures
Published: 2021

4. A Noise-Tolerant Quasi-Newton Algorithm for Unconstrained Optimization

Author: Shi, Hao-Jun Michael, Xie, Yuchen, Byrd, Richard, and Nocedal, Jorge
Subjects: Mathematics - Optimization and Control
Abstract: This paper describes an extension of the BFGS and L-BFGS methods for the minimization of a nonlinear function subject to errors. This work is motivated by applications that contain computational noise, employ low-precision arithmetic, or are subject to statistical noise. The classical BFGS and L-BFGS methods can fail in such circumstances because the updating procedure can be corrupted and the line search can behave erratically. The proposed method addresses these difficulties and ensures that the BFGS update is stable by employing a lengthening procedure that spaces out the points at which gradient differences are collected. A new line search, designed to tolerate errors, guarantees that the Armijo-Wolfe conditions are satisfied under most reasonable conditions, and works in conjunction with the lengthening procedure. The proposed methods are shown to enjoy convergence guarantees for strongly convex functions. Detailed implementations of the methods are presented, together with encouraging numerical results., Comment: 27 pages, 13 figures, 2 tables
Published: 2020

5. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

Author: Shi, Hao-Jun Michael, Mudigere, Dheevatsa, Naumov, Maxim, and Yang, Jiyan
Subjects: Computer Science - Machine Learning, Computer Science - Information Retrieval, Statistics - Machine Learning
Abstract: Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the categorical data, embeddings map each category to a unique dense representation within an embedded space. Since each categorical feature could take on as many as tens of millions of different possible categories, the embedding tables form the primary memory bottleneck during both training and inference. We propose a novel approach for reducing the embedding size in an end-to-end fashion by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition. By storing multiple smaller embedding tables based on each complementary partition and combining embeddings from each table, we define a unique embedding for each category at smaller memory cost. This approach may be interpreted as using a specific fixed codebook to ensure uniqueness of each category's representation. Our experimental results demonstrate the effectiveness of our approach over the hashing trick for reducing the size of the embedding tables in terms of model loss and accuracy, while retaining a similar reduction in the number of parameters., Comment: 11 pages, 7 figures, 1 table
Published: 2019
Full Text: View/download PDF

6. Deep Learning Recommendation Model for Personalization and Recommendation Systems

Author: Naumov, Maxim, Mudigere, Dheevatsa, Shi, Hao-Jun Michael, Huang, Jianyu, Sundaraman, Narayanan, Park, Jongsoo, Wang, Xiaodong, Gupta, Udit, Wu, Carole-Jean, Azzolini, Alisson G., Dzhulgakov, Dmytro, Mallevich, Andrey, Cherniavskii, Ilia, Lu, Yinghai, Krishnamoorthi, Raghuraman, Yu, Ansha, Kondratenko, Volodymyr, Pereira, Stephanie, Chen, Xianjie, Chen, Wenlin, Rao, Vijay, Jia, Bill, Xiong, Liang, and Smelyanskiy, Misha
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning, 68T05, I.2.6, I.5.0, H.3.3, H.3.4
Abstract: With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation model (DLRM) and provide its implementation in both PyTorch and Caffe2 frameworks. In addition, we design a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers. We compare DLRM against existing recommendation models and characterize its performance on the Big Basin AI platform, demonstrating its usefulness as a benchmark for future algorithmic experimentation and system co-design., Comment: 10 pages, 6 figures
Published: 2019

7. A Progressive Batching L-BFGS Method for Machine Learning

Author: Bollapragada, Raghu, Mudigere, Dheevatsa, Nocedal, Jorge, Shi, Hao-Jun Michael, and Tang, Ping Tak Peter
Subjects: Mathematics - Optimization and Control, Computer Science - Learning, Statistics - Machine Learning
Abstract: The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method., Comment: ICML 2018. 25 pages, 17 figures, 2 tables
Published: 2018

8. A Primer on Coordinate Descent Algorithms

Author: Shi, Hao-Jun Michael, Tu, Shenyinying, Xu, Yangyang, and Yin, Wotao
Subjects: Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: This monograph presents a class of algorithms called coordinate descent algorithms for mathematicians, statisticians, and engineers outside the field of optimization. This particular class of algorithms has recently gained popularity due to their effectiveness in solving large-scale optimization problems in machine learning, compressed sensing, image processing, and computational statistics. Coordinate descent algorithms solve optimization problems by successively minimizing along each coordinate or coordinate hyperplane, which is ideal for parallelized and distributed computing. Avoiding detailed technicalities and proofs, this monograph gives relevant theory and examples for practitioners to effectively apply coordinate descent to modern problems in data science and engineering.
Published: 2016

9. Optimizing quantization for Lasso recovery

Author: Gu, Xiaoyi, Tu, Shenyinying, Shi, Hao-Jun Michael, Case, Mindy, Needell, Deanna, and Plan, Yaniv
Subjects: Computer Science - Information Theory, 94A12, 60D05, 90C25
Abstract: This letter is focused on quantized Compressed Sensing, assuming that Lasso is used for signal estimation. Leveraging recent work, we provide a framework to optimize the quantization function and show that the recovered signal converges to the actual signal at a quadratic rate as a function of the quantization level. We show that when the number of observations is high, this method of quantization gives a significantly better recovery rate than standard Lloyd-Max quantization. We support our theoretical analysis with numerical simulations.
Published: 2016

10. Practical Algorithms for Learning Near-Isometric Linear Embeddings

Author: Luo, Jerry, Shapiro, Kayla, Shi, Hao-Jun Michael, Yang, Qi, and Zhu, Kan
Subjects: Statistics - Machine Learning, Computer Science - Learning, Mathematics - Optimization and Control, 90C90
Abstract: We propose two practical non-convex approaches for learning near-isometric, linear embeddings of finite sets of data points. Given a set of training points $\mathcal{X}$, we consider the secant set $S(\mathcal{X})$ that consists of all pairwise difference vectors of $\mathcal{X}$, normalized to lie on the unit sphere. The problem can be formulated as finding a symmetric and positive semi-definite matrix $\boldsymbol{\Psi}$ that preserves the norms of all the vectors in $S(\mathcal{X})$ up to a distortion parameter $\delta$. Motivated by non-negative matrix factorization, we reformulate our problem into a Frobenius norm minimization problem, which is solved by the Alternating Direction Method of Multipliers (ADMM) and develop an algorithm, FroMax. Another method solves for a projection matrix $\boldsymbol{\Psi}$ by minimizing the restricted isometry property (RIP) directly over the set of symmetric, postive semi-definite matrices. Applying ADMM and a Moreau decomposition on a proximal mapping, we develop another algorithm, NILE-Pro, for dimensionality reduction. FroMax is shown to converge faster for smaller $\delta$ while NILE-Pro converges faster for larger $\delta$. Both non-convex approaches are then empirically demonstrated to be more computationally efficient than prior convex approaches for a number of applications in machine learning and signal processing.
Published: 2016

11. Methods for Quantized Compressed Sensing

Author: Shi, Hao-Jun Michael, Case, Mindy, Gu, Xiaoyi, Tu, Shenyinying, and Needell, Deanna
Subjects: Computer Science - Information Theory, Mathematics - Numerical Analysis, 94A12, 60D05, 90C25
Abstract: In this paper, we compare and catalog the performance of various greedy quantized compressed sensing algorithms that reconstruct sparse signals from quantized compressed measurements. We also introduce two new greedy approaches for reconstruction: Quantized Compressed Sampling Matching Pursuit (QCoSaMP) and Adaptive Outlier Pursuit for Quantized Iterative Hard Thresholding (AOP-QIHT). We compare the performance of greedy quantized compressed sensing algorithms for a given bit-depth, sparsity, and noise level.
Published: 2015

12. A Noise-Tolerant Quasi-Newton Algorithm for Unconstrained Optimization

Author: Shi, Hao-Jun Michael, Xie, Yuchen, Byrd, Richard, and Nocedal, Jorge
Subjects: Optimization and Control (math.OC), FOS: Mathematics, MathematicsofComputing_NUMERICALANALYSIS, Mathematics - Optimization and Control, Software, Theoretical Computer Science
Abstract: This paper describes an extension of the BFGS and L-BFGS methods for the minimization of a nonlinear function subject to errors. This work is motivated by applications that contain computational noise, employ low-precision arithmetic, or are subject to statistical noise. The classical BFGS and L-BFGS methods can fail in such circumstances because the updating procedure can be corrupted and the line search can behave erratically. The proposed method addresses these difficulties and ensures that the BFGS update is stable by employing a lengthening procedure that spaces out the points at which gradient differences are collected. A new line search, designed to tolerate errors, guarantees that the Armijo-Wolfe conditions are satisfied under most reasonable conditions, and works in conjunction with the lengthening procedure. The proposed methods are shown to enjoy convergence guarantees for strongly convex functions. Detailed implementations of the methods are presented, together with encouraging numerical results., 27 pages, 13 figures, 2 tables
Published: 2022
Full Text: View/download PDF

13. On the numerical performance of finite-difference-based methods for derivative-free optimization

Author: Shi, Hao-Jun Michael, primary, Qiming Xuan, Melody, additional, Oztoprak, Figen, additional, and Nocedal, Jorge, additional
Published: 2022
Full Text: View/download PDF

14. Adaptive Finite-Difference Interval Estimation for Noisy Derivative-Free Optimization

Author: Shi, Hao-Jun Michael, primary, Xie, Yuchen, additional, Xuan, Melody Qiming, additional, and Nocedal, Jorge, additional
Published: 2022
Full Text: View/download PDF

15. On the numerical performance of finite-difference-based methods for derivative-free optimization.

Author: Shi, Hao-Jun Michael, Qiming Xuan, Melody, Oztoprak, Figen, and Nocedal, Jorge
Subjects: *NONLINEAR equations, *CONSTRAINED optimization, *NOISE
Abstract: The goal of this paper is to investigate an approach for derivative-free optimization that has not received sufficient attention in the literature and is yet one of the simplest to implement and parallelize. In its simplest form, it consists of employing derivative-based methods for unconstrained or constrained optimization and replacing the gradient of the objective (and constraints) by finite-difference approximations. This approach is applicable to problems with or without noise in the functions. The differencing interval is determined by a bound on the second (or third) derivative and by the noise level, which is assumed to be known or to be accessible through difference tables or sampling. The use of finite-difference gradient approximations has been largely dismissed in the derivative-free optimization literature as too expensive in terms of function evaluations or as impractical in the presence of noise. However, the test results presented in this paper suggest that it has much to be recommended. The experiments compare newuoa, dfo-ls and cobyla against finite-difference versions of l-bfgs, lmder and knitro on three classes of problems: general unconstrained problems, nonlinear least squares problems and nonlinear programs with inequality constraints. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

16. Methods for Stochastic, Noisy, and Derivative-Free Optimization

Author: Shi, Hao-Jun Michael
Published: 2022
Full Text: View/download PDF

17. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

Author: Shi, Hao-Jun Michael, primary, Mudigere, Dheevatsa, additional, Naumov, Maxim, additional, and Yang, Jiyan, additional
Published: 2020
Full Text: View/download PDF

18. Optimizing Quantization for Lasso Recovery

Author: Gu, Xiaoyi, primary, Tu, Shenyinying, additional, Shi, Hao-Jun Michael, additional, Case, Mindy, additional, Needell, Deanna, additional, and Plan, Yaniv, additional
Published: 2018
Full Text: View/download PDF

19. Methods for quantized compressed sensing

Author: Shi, Hao-Jun Michael, primary, Case, Mindy, additional, Gu, Xiaoyi, additional, Tu, Shenyinying, additional, and Needell, Deanna, additional
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

19 results on '"Shi, Hao-Jun Michael"'

1. A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

2. Adaptive Finite-Difference Interval Estimation for Noisy Derivative-Free Optimization

3. On the Numerical Performance of Derivative-Free Optimization Methods Based on Finite-Difference Approximations

4. A Noise-Tolerant Quasi-Newton Algorithm for Unconstrained Optimization

5. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

6. Deep Learning Recommendation Model for Personalization and Recommendation Systems

7. A Progressive Batching L-BFGS Method for Machine Learning

8. A Primer on Coordinate Descent Algorithms

9. Optimizing quantization for Lasso recovery

10. Practical Algorithms for Learning Near-Isometric Linear Embeddings

11. Methods for Quantized Compressed Sensing

12. A Noise-Tolerant Quasi-Newton Algorithm for Unconstrained Optimization

13. On the numerical performance of finite-difference-based methods for derivative-free optimization

14. Adaptive Finite-Difference Interval Estimation for Noisy Derivative-Free Optimization

15. On the numerical performance of finite-difference-based methods for derivative-free optimization.

16. Methods for Stochastic, Noisy, and Derivative-Free Optimization

17. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

18. Optimizing Quantization for Lasso Recovery

19. Methods for quantized compressed sensing

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

19 results on '"Shi, Hao-Jun Michael"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources