26 results on '"approximation error"'
Search Results
2. On Unbalanced Optimal Transport: Gradient Methods, Sparsity and Approximation Error.
- Author
-
Quang Minh Nguyen, Nguyen, Hoang H., Yi Zhou, and Nguyen, Lam M.
- Subjects
- *
APPROXIMATION error , *TRANSPORTATION planning , *DEEP learning , *IMAGE reconstruction algorithms , *EXTRAPOLATION - Abstract
We study the Unbalanced Optimal Transport (UOT) between two measures of possibly different masses with at most n components, where the marginal constraints of standard Optimal Transport (OT) are relaxed via Kullback-Leibler divergence with regularization factor τ. Although only Sinkhorn-based UOT solvers have been analyzed in the literature with the iteration complexity of O(τ log(n) / ε lof (log(n)/ε)) and per-iteration cost of O(n²) for achieving the desired error ε, their positively dense output transportation plans strongly hinder the practicality. On the other hand, while being vastly used as heuristics for computing UOT in modern deep learning applications and having shown success in sparse OT problem, gradient methods applied to UOT have not been formally studied. In this paper, we propose a novel algorithm based on Gradient Extrapolation Method (GEM-UOT) to find an ε-approximate solution to the UOT problem in O(κ log (τ n/ε)) iterations with O(n²) per-iteration cost, where κ is the condition number depending on only the two input measures. Our proof technique is based on a novel dual formulation of the squared ℓ2-norm UOT objective, which fills the lack of sparse UOT literature and also leads to a new characterization of approximation error between UOT and OT. To this end, we further present a novel approach of OT retrieval from UOT, which is based on GEM-UOT with fine tuned τ and a post-process projection step. Extensive experiments on synthetic and real datasets validate our theories and demonstrate the favorable performance of our methods in practice. We showcase GEM-UOT on the task of color transfer in terms of both the quality of the transfer image and the sparsity of the transportation plan. [ABSTRACT FROM AUTHOR]
- Published
- 2023
3. Over-parameterized Deep Nonparametric Regression for Dependent Data with Its Applications to Reinforcement Learning.
- Author
-
Xingdong Feng, Yuling Jiao, Lican Kang, Baqun Zhang, and Fan Zhou
- Subjects
- *
REINFORCEMENT learning , *DEEP learning , *DEEP reinforcement learning , *APPROXIMATION error - Abstract
In this paper, we provide statistical guarantees for over-parameterized deep nonparametric regression in the presence of dependent data. By decomposing the error, we establish non-asymptotic error bounds for deep estimation, which is achieved by effectively balancing the approximation and generalization errors. We have derived an approximation result for Hölder functions with constrained weights. Additionally, the generalization error is bounded by the weight norm, allowing for a neural network parameter number that is much larger than the training sample size. Furthermore, we address the issue of the curse of dimensionality by assuming that the samples originate from distributions with low intrinsic dimensions. Under this assumption, we are able to overcome the challenges posed by high-dimensional spaces. By incorporating an additional error propagation mechanism, we derive oracle inequalities for the over-parameterized deep fitted Q-iteration. [ABSTRACT FROM AUTHOR]
- Published
- 2023
4. Randomized Spectral Co-Clustering for Large-Scale Directed Networks.
- Author
-
Xiao Guo, Yixuan Qiu, Hai Zhang, and Xiangyu Chang
- Subjects
- *
APPROXIMATION error , *ERROR rates , *STOCHASTIC models , *DIRECTED graphs , *MULTICASTING (Computer networks) , *ALGORITHMS , *5G networks - Abstract
Directed networks are broadly used to represent asymmetric relationships among units. Co-clustering aims to cluster the senders and receivers of directed networks simultaneously. In particular, the well-known spectral clustering algorithm could be modified as the spectral co-clustering to co-cluster directed networks. However, large-scale networks pose great computational challenges to it. In this paper, we leverage sketching techniques and derive two randomized spectral co-clustering algorithms, one random-projection-based and the other random-sampling-based, to accelerate the co-clustering of large-scale directed networks. We theoretically analyze the resulting algorithms under two generative models - the stochastic co-block model and the degree-corrected stochastic co-block model, and establish their approximation error rates and misclustering error rates, indicating better bounds than the state-of-the-art results of co-clustering literature. Numerically, we design and conduct simulations to support our theoretical results and test the efficiency of the algorithms on real networks with up to millions of nodes. A publicly available R package RandClust is developed for better usability and reproducibility of the proposed methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
5. Euler-Lagrange Analysis of Generative Adversarial Networks.
- Author
-
Asokan, Siddarth and Seelamantula, Chandra Sekhar
- Subjects
- *
GENERATIVE adversarial networks , *APPROXIMATION error , *DIFFERENTIAL equations , *CALCULUS of variations , *IMAGE registration , *LAGRANGE multiplier , *IMAGE reconstruction algorithms - Abstract
We consider Generative Adversarial Networks (GANs) and address the underlying functional optimization problem ab initio within a variational setting. Strictly speaking, the optimization of the generator and discriminator functions must be carried out in accordance with the Euler-Lagrange conditions, which become particularly relevant in scenarios where the optimization cost involves regularizers comprising the derivatives of these functions. Considering Wasserstein GANs (WGAN) with a gradient-norm penalty, we show that the optimal discriminator is the solution to a Poisson differential equation. In principle, the optimal discriminator can be obtained in closed form without having to train a neural network. We illustrate this by employing a Fourier-series approximation to solve the Poisson differential equation. Experimental results based on synthesized Gaussian data demonstrate superior convergence behavior of the proposed approach in comparison with the baseline WGAN variants that employ weight-clipping, gradient or Lipschitz penalties on the discriminator on low-dimensional data. We also analyze the truncation error of the Fourier-series approximation and the estimation error of the Fourier coefficients in a high-dimensional setting. We demonstrate applications to real-world images considering latent-space prior matching in Wasserstein autoencoders and present performance comparisons on benchmark datasets such as MNIST, SVHN, CelebA, CIFAR-10, and Ukiyo-E. We demonstrate that the proposed approach achieves comparable reconstruction error and Frechet inception distance with faster convergence and up to two-fold improvement in image sharpness. [ABSTRACT FROM AUTHOR]
- Published
- 2023
6. An Error Analysis of Generative Adversarial Networks for Learning Distributions.
- Author
-
Jian Huang, Yuling Jiao, Zhen Li, Shiao Liu, Yang Wang, and Yunfei Yang
- Subjects
- *
GENERATIVE adversarial networks , *ARTIFICIAL neural networks , *STATISTICAL errors , *APPROXIMATION error , *DISTRIBUTION (Probability theory) - Abstract
This paper studies how well generative adversarial networks (GANs) learn probability distributions from finite samples. Our main results establish the convergence rates of GANs under a collection of integral probability metrics defined through Hölder classes, including the Wasserstein distance as a special case. We also show that GANs are able to adaptively learn data distributions with low-dimensional structures or have Hölder densities, when the network architectures are chosen properly. In particular, for distributions concentrated around a low-dimensional set, we show that the learning rates of GANs do not depend on the high ambient dimension, but on the lower intrinsic dimension. Our analysis is based on a new oracle inequality decomposing the estimation error into the generator and discriminator approximation error and the statistical error, which may be of independent interest. [ABSTRACT FROM AUTHOR]
- Published
- 2022
7. On the Robustness to Misspecification of α-posteriors and Their Variational Approximations.
- Author
-
Medina, Marco Avella, Olea, José Luis Montiel, Rush, Cynthia, and Velez, Amilcar
- Subjects
- *
APPROXIMATION error , *PARAMETRIC modeling , *GAUSSIAN distribution - Abstract
α-posteriors and their variational approximations distort standard posterior inference by downweighting the likelihood and introducing variational approximation errors. We show that such distortions, if tuned appropriately, reduce the Kullback-Leibler (KL) divergence from the true, but perhaps infeasible, posterior distribution when there is potential parametric model misspecification. To make this point, we derive a Bernstein-von Mises theorem showing convergence in total variation distance of α-posteriors and their variational approximations to limiting Gaussian distributions. We use these limiting distributions to evaluate the KL divergence between true and reported posteriors. We show that the KL divergence is minimized by choosing α strictly smaller than one, assuming there is a vanishingly small probability of model misspecification. The optimized value of α becomes smaller as the misspecification becomes more severe. The optimized KL divergence increases logarithmically in the magnitude of misspecification and not linearly as with the usual posterior. Moreover, the optimized variational approximations of α-posteriors can induce additional robustness to model misspecification beyond that obtained by optimally downweighting the likelihood. [ABSTRACT FROM AUTHOR]
- Published
- 2022
8. Quantile regression with ReLU Networks: Estimators and minimax rates.
- Author
-
Padilla, Oscar Hernan Madrid, Tansey, Wesley, and Yanzhen Chen
- Subjects
- *
QUANTILE regression , *BESOV spaces , *ERROR functions , *APPROXIMATION error - Abstract
Quantile regression is the task of estimating a specified percentile response, such as the median (50th percentile), from a collection of known covariates. We study quantile regression with rectified linear unit (ReLU) neural networks as the chosen model class. We derive an upper bound on the expected mean squared error of a ReLU network used to estimate any quantile conditioning on a set of covariates. This upper bound only depends on the best possible approximation error, the number of layers in the network, and the number of nodes per layer. We further show upper bounds that are tight for two large classes of functions: compositions of Holder functions and members of a Besov space. These tight bounds imply ReLU networks with quantile regression achieve minimax rates for broad collections of function types. Unlike existing work, the theoretical results hold under minimal assumptions and apply to general error distributions, including heavy-tailed distributions. Empirical simulations on a suite of synthetic response functions demonstrate the theoretical results translate to practical implementations of ReLU networks. Overall, the theoretical and empirical results provide insight into the strong performance of ReLU neural networks for quantile regression across a broad range of function classes and error distributions. All code for this paper is publicly available at https://github.com/tansey/quantile-regression. [ABSTRACT FROM AUTHOR]
- Published
- 2022
9. An improper estimator with optimal excess risk in misspecified density estimation and logistic regression.
- Author
-
Mourtada, Jaouad and Gaïffas, Stéphane
- Subjects
- *
LOGISTIC regression analysis , *MAXIMUM likelihood statistics , *STATISTICAL learning , *APPROXIMATION error , *DENSITY - Abstract
We introduce a procedure for conditional density estimation under logarithmic loss, which we call SMP (Sample Minmax Predictor). This estimator minimizes a new general excess risk bound for statistical learning. On standard examples, this bound scales as d=n with d the model dimension and n the sample size, and critically remains valid under model misspecification. Being an improper (out-of-model) procedure, SMP improves over withinmodel estimators such as the maximum likelihood estimator, whose excess risk degrades under misspecification. Compared to approaches reducing to the sequential problem, our bounds remove suboptimal log n factors and can handle unbounded classes. For the Gaussian linear model, the predictions and risk bound of SMP are governed by leverage scores of covariates, nearly matching the optimal risk in the well-specified case without conditions on the noise variance or approximation error of the linear model. For logistic regression, SMP provides a non-Bayesian approach to calibration of probabilistic predictions relying on virtual samples, and can be computed by solving two logistic regressions. It achieves a non-asymptotic excess risk of O((d+B2R2)=n), where R bounds the norm of features and B that of the comparison parameter; by contrast, no within-model estimator can achieve better rate than min(BR= p n; eBR=n) in general (Hazan et al., 2014). This provides a more practical alternative to Bayesian approaches, which require approximate posterior sampling, thereby partly addressing a question raised by Foster et al. (2018). [ABSTRACT FROM AUTHOR]
- Published
- 2022
10. On Universal Approximation and Error Bounds for Fourier Neural Operators.
- Author
-
Nikola Kovachki, Lanthaler, Samuel, and Mishra, Siddhartha
- Subjects
- *
APPROXIMATION error , *NAVIER-Stokes equations , *FLUID dynamics , *DARCY'S law - Abstract
Fourier neural operators (FNOs) have recently been proposed as an effective framework for learning operators that map between infinite-dimensional spaces. We prove that FNOs are universal, in the sense that they can approximate any continuous operator to desired accuracy. Moreover, we suggest a mechanism by which FNOs can approximate operators associated with PDEs efficiently. Explicit error bounds are derived to show that the size of the FNO, approximating operators associated with a Darcy type elliptic PDE and with the incompressible Navier-Stokes equations of fluid dynamics, only increases sub (log)-linearly in terms of the reciprocal of the error. Thus, FNOs are shown to efficiently approximate operators arising in a large class of PDEs. [ABSTRACT FROM AUTHOR]
- Published
- 2021
11. Pathwise Conditioning of Gaussian Processes.
- Author
-
Wilson, James T., Borovitskiy, Viacheslav, Terenin, Alexander, Mostowsky, Peter, and Deisenroth, Marc Peter
- Subjects
- *
MONTE Carlo method , *MARGINAL distributions , *GAUSSIAN processes , *REINFORCEMENT learning , *MARKOV chain Monte Carlo , *APPROXIMATION error , *GLOBAL optimization - Abstract
As Gaussian processes are used to answer increasingly complex questions, analytic solutions become scarcer and scarcer. Monte Carlo methods act as a convenient bridge for connecting intractable mathematical expressions with actionable estimates via sampling. Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations. This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector. These methods are prohibitively expensive in cases where we would, ideally, like to draw high-dimensional vectors or even continuous sample paths. In this work, we investigate a different line of reasoning: rather than focusing on distributions, we articulate Gaussian conditionals at the level of random variables. We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors. Starting from first principles, we derive these methods and analyze the approximation errors they introduce. We, then, ground these results by exploring the practical implications of pathwise conditioning in various applied settings, such as global optimization and reinforcement learning. [ABSTRACT FROM AUTHOR]
- Published
- 2021
12. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift.
- Author
-
Agarwal, Alekh, Kakade, Sham M., Lee, Jason D., and Mahajan, Gaurav
- Subjects
- *
SUPERVISED learning , *REINFORCEMENT learning , *APPROXIMATION error , *MARKOV processes , *LEARNING problems - Abstract
Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution or how they cope with approximation error due to using a restricted class of parametric policies. This work provides provable characterizations of the computational, approximation, and sample size properties of policy gradient methods in the context of discounted Markov Decision Processes (MDPs). We focus on both: "tabular" policy parameterizations, where the optimal policy is contained in the class and where we show global convergence to the optimal policy; and parametric policy classes (considering both log-linear and neural policy classes), which may not contain the optimal policy and where we provide agnostic learning results. One central contribution of this work is in providing approximation guarantees that are average case--which avoid explicit worst-case dependencies on the size of state space--by making a formal connection to supervised learning under distribution shift. This characterization shows an important interplay between estimation error, approximation error, and exploration (as characterized through a precisely defined condition number). [ABSTRACT FROM AUTHOR]
- Published
- 2021
13. An algorithmic view of ℓ2 regularization and some path-following algorithms.
- Author
-
Yunzhang Zhu and Renxiong Liu
- Subjects
- *
NEWTON-Raphson method , *APPROXIMATION error , *MATHEMATICAL optimization , *MATHEMATICAL regularization , *ERROR rates , *CONVEX functions - Abstract
We establish an equivalence between the ℓ2-regularized solution path for a convex loss function, and the solution of an ordinary dierentiable equation (ODE). Importantly, this equivalence reveals that the solution path can be viewed as the flow of a hybrid of gradient descent and Newton method applying to the empirical loss, which is similar to a widely used optimization technique called trust region method. This provides an interesting algorithmic view of ℓ2 regularization, and is in contrast to the conventional view that the ℓ2 regularization solution path is similar to the gradient flow of the empirical loss. New path-following algorithms based on homotopy methods and numerical ODE solvers are proposed to numerically approximate the solution path. In particular, we consider respectively Newton method and gradient descent method as the basis algorithm for the homotopy method, and establish their approximation error rates over the solution path. Importantly, our theory suggests novel schemes to choose grid points that guarantee an arbitrarily small suboptimality for the solution path. In terms of computational cost, we prove that in order to achieve an ffl-suboptimality for the entire solution path, the number of Newton steps required for the Newton method is O(1=2), while the number of gradient steps required for the gradient descent method is O 1 ln(1). Finally, we use ℓ2-regularized logistic regression as an illustrating example to demonstrate the effectiveness of the proposed path-following algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2021
14. Risk Bounds for Unsupervised Cross-Domain Mapping with IPMs.
- Author
-
Galanti, Tomer, Benaim, Sagie, and Wolf, Lior
- Subjects
- *
OPTIMAL stopping (Mathematical statistics) , *APPROXIMATION error - Abstract
The recent empirical success of unsupervised cross-domain mapping algorithms, in mapping between two domains that share common characteristics, is not well-supported by theoretical justifications. This lacuna is especially troubling, given the clear ambiguity in such mappings. We work with adversarial training methods based on integral probability metrics (IPMs) and derive a novel risk bound, which upper bounds the risk between the learned mapping h and the target mapping y, by a sum of three terms: (i) the risk between h and the most distant alternative mapping that was learned by the same cross-domain mapping algorithm, (ii) the minimal discrepancy between the target domain and the domain obtained by applying a hypothesis h* on the samples of the source domain, where h* is a hypothesis selectable by the same algorithm, and (iii) an approximation error term that decreases as the capacity of the class of discriminators increases and is empirically shown to be small. The bound is directly related to Occam's razor and encourages the selection of the minimal architecture that supports a small mapping discrepancy. The bound leads to multiple algorithmic consequences, including a method for hyperparameter selection and early stopping in cross-domain mapping. [ABSTRACT FROM AUTHOR]
- Published
- 2021
15. A determinantal point process for column subset selection.
- Author
-
Belhadji, Ayoub, Bardenet, Rémi, and Chainais, Pierre
- Subjects
- *
SUBSET selection , *FEATURE selection , *APPROXIMATION error , *POINT processes , *ALGORITHMS , *PRINCIPAL components analysis - Abstract
Two popular approaches to dimensionality reduction are principal component analysis, which projects onto a small number of well-chosen but non-interpretable directions, and feature selection, which selects a small number of the original features. Feature selection can be abstracted as selecting the subset of columns of a matrix X 2 RNd which minimize the approximation error, i.e., the norm of the residual after projecting X onto the space spanned by the selected columns. Such a combinatorial optimization is usually impractical, and there has been interest in polynomial-cost, random subset selection algorithms that favour small values of this approximation error. We propose sampling from a projection determinantal point process, a repulsive distribution over column indices that favours diversity among the selected columns. We bound the ratio of the expected approximation error over the optimal error of PCA. These bounds improve over the state-of-the-art bounds of volume sampling when some realistic structural assumptions are satisfied for X. Numerical experiments suggest that our bounds are tight, and that our algorithms have comparable performance with the double phase algorithm, often considered the practical state-of-the-art. [ABSTRACT FROM AUTHOR]
- Published
- 2020
16. Adaptive Approximation and Generalization of Deep Neural Network with Intrinsic Dimensionality.
- Author
-
Ryumei Nakada and Masaaki Imaizumi
- Subjects
- *
GENERALIZATION , *APPROXIMATION error , *ERROR rates , *COMPUTER simulation - Abstract
In this study, we prove that an intrinsic low dimensionality of covariates is the main factor that determines the performance of deep neural networks (DNNs). DNNs generally provide outstanding empirical performance. Hence, numerous studies have actively investigated the theoretical properties of DNNs to understand their underlying mechanisms. In particular, the behavior of DNNs in terms of high-dimensional data is one of the most critical questions. However, this issue has not been sufficiently investigated from the aspect of covariates, although high-dimensional data have practically low intrinsic dimensionality. In this study, we derive bounds for an approximation error and a generalization error regarding DNNs with intrinsically low dimensional covariates. We apply the notion of the Minkowski dimension and develop a novel proof technique. Consequently, we show that convergence rates of the errors by DNNs do not depend on the nominal high dimensionality of data, but on its lower intrinsic dimension. We further prove that the rate is optimal in the minimax sense. We identify an advantage of DNNs by showing that DNNs can handle a broader class of intrinsic low dimensional data than other adaptive estimators. Finally, we conduct a numerical simulation to validate the theoretical results. [ABSTRACT FROM AUTHOR]
- Published
- 2020
17. Scalable Approximate MCMC Algorithms for the Horseshoe Prior.
- Author
-
Johndrow, James, Orenstein, Paulo, and Bhattacharya, Anirban
- Subjects
- *
MARKOV chain Monte Carlo , *SCALABILITY , *BAYESIAN analysis , *APPROXIMATION error , *ALGORITHMS , *MATRIX multiplications - Abstract
The horseshoe prior is frequently employed in Bayesian analysis of high-dimensional models, and has been shown to achieve minimax optimal risk properties when the truth is sparse. While optimization-based algorithms for the extremely popular Lasso and elastic net procedures can scale to dimension in the hundreds of thousands, algorithms for the horseshoe that use Markov chain Monte Carlo (MCMC) for computation are limited to problems an order of magnitude smaller. This is due to high computational cost per step and growth of the variance of time-averaging estimators as a function of dimension. We propose two new MCMC algorithms for computation in these models that have significantly improved performance compared to existing alternatives. One of the algorithms also approximates an expensive matrix product to give orders of magnitude speedup in high-dimensional applications. We prove guarantees for the accuracy of the approximate algorithm, and show that gradually decreasing the approximation error as the chain extends results in an exact algorithm. The scalability of the algorithm is illustrated in simulations with problem size as large as N=5,000 observations and p=50,000 predictors, and an application to a genome-wide association study with N=2,267 and p=98,385. The empirical results also show that the new algorithm yields estimates with lower mean squared error, intervals with better coverage, and elucidates features of the posterior that were often missed by previous algorithms in high dimensions, including bimodality of posterior marginals indicating uncertainty about which covariates belong in the model. [ABSTRACT FROM AUTHOR]
- Published
- 2020
18. Variance-based Regularization with Convex Objectives.
- Author
-
Duchi, John and Hongseok Namkoong
- Subjects
- *
ROBUST optimization , *MATHEMATICAL regularization , *APPROXIMATION error , *PERFORMANCE standards , *KRIGING - Abstract
We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems. [ABSTRACT FROM AUTHOR]
- Published
- 2019
19. A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication.
- Author
-
Lopes, Miles E., Shusen Wang, and Mahoney, Michael W.
- Subjects
- *
MATRIX multiplications , *NUMERICAL solutions for linear algebra , *EXTRAPOLATION , *APPROXIMATION error - Abstract
In recent years, randomized methods for numerical linear algebra have received growing interest as a general approach to large-scale problems. Typically, the essential ingredient of these methods is some form of randomized dimension reduction, which accelerates computations, but also creates random approximation error. In this way, the dimension reduction step encodes a tradeoff between cost and accuracy. However, the exact numerical relationship between cost and accuracy is typically unknown, and consequently, it may be difficult for the user to precisely know (1) how accurate a given solution is, or (2) how much computation is needed to achieve a given level of accuracy. In the current paper, we study randomized matrix multiplication (sketching) as a prototype setting for addressing these general problems. As a solution, we develop a bootstrap method for directly estimating the accuracy as a function of the reduced dimension (as opposed to deriving worst-case bounds on the accuracy in terms of the reduced dimension). From a computational standpoint, the proposed method does not substantially increase the cost of standard sketching methods, and this is made possible by an "extrapolation" technique. In addition, we provide both theoretical and empirical results to demonstrate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2019
20. Robust Frequent Directions with Application in Online Learning.
- Author
-
Luo Luo, Cheng Chen, Zhihua Zhang, Wu-Jun Li, and Tong Zhang
- Subjects
- *
ONLINE education , *ONLINE algorithms , *APPROXIMATION error , *MACHINE learning , *MATHEMATICAL regularization - Abstract
The frequent directions (FD) technique is a deterministic approach for online sketching that has many applications in machine learning. The conventional FD is a heuristic procedure that often outputs rank deficient matrices. To overcome the rank deficiency problem, we propose a new sketching strategy called robust frequent directions (RFD) by introducing a regularization term. RFD can be derived from an optimization problem. It updates the sketch matrix and the regularization term adaptively and jointly. RFD reduces the approximation error of FD without increasing the computational cost. We also apply RFD to online learning and propose an effective hyperparameterfree online Newton algorithm. We derive a regret bound for our online Newton algorithm based on RFD, which guarantees the robustness of the algorithm. The experimental studies demonstrate that the proposed method outperforms state-of-the-art second order online learning algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2019
21. A New and Flexible Approach to the Analysis of Paired Comparison Data.
- Author
-
Oliveira, Ivo F. D., Ailon, Nir, and Davidov, Ori
- Subjects
- *
SYMMETRIC functions , *DISTRIBUTION (Probability theory) , *MATHEMATICAL bounds , *COMPUTER programming , *APPROXIMATION error - Abstract
We consider the situation where I items are ranked by paired comparisons. It is usually assumed that the probability that item i is preferred over item j is pij = F(μi-μj) where F is a symmetric distribution function, which we refer to as the comparison function, and μi and μj are the merits or scores of the compared items. This modelling framework, which is ubiquitous in the paired comparison literature, strongly depends on the assumption that the comparison function F is known. In practice, however, this assumption is often unrealistic and may result in poor fit and erroneous inferences. This limitation has motivated us to relax the assumption that F is fully known and simultaneously estimate the merits of the objects and the underlying comparison function. Our formulation yields a exible semi-definite programming problem that we use as a refinement step for estimating the paired comparison probability matrix. We provide a detailed sensitivity analysis and, as a result, we establish the consistency of the resulting estimators and provide bounds on the estimation and approximation errors. Some statistical properties of the resulting estimators as well as model selection criteria are investigated. Finally, using a large data-set of computer chess matches, we estimate the comparison function andfind that the model used by the International Chess Federation does not seem to apply to computer chess. [ABSTRACT FROM AUTHOR]
- Published
- 2018
22. Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging.
- Author
-
Shusen Wang, Gittens, Alex, and Mahoney, Michael W.
- Subjects
- *
RIDGE regression (Statistics) , *MATHEMATICAL optimization , *APPROXIMATION error , *VARIANCES , *PROBLEM solving - Abstract
We address the statistical and optimization impacts of the classical sketch and Hessian sketch used to approximately solve the Matrix Ridge Regression (MRR) problem. Prior research has quantified the effects of classical sketch on the strictly simpler least squares regression (LSR) problem. We establish that classical sketch has a similar effect upon the optimization properties of MRR as it does on those of LSR: namely, it recovers nearly optimal solutions. By contrast, Hessian sketch does not have this guarantee; instead, the approximation error is governed by a subtle interplay between the "mass" in the responses and the optimal objective value. For both types of approximation, the regularization in the sketched MRR problem results in significantly different statistical properties from those of the sketched LSR problem. In particular, there is a bias-variance trade-off in sketched MRR that is not present in sketched LSR. We provide upper and lower bounds on the bias and variance of sketched MRR; these bounds show that classical sketch significantly increases the variance, while Hessian sketch significantly increases the bias. Empirically, sketched MRR solutions can have risks that are higher by an order-of-magnitude than those of the optimal MRR solutions. We establish theoretically and empirically that model averaging greatly decreases the gap between the risks of the true and sketched solutions to the MRR problem. Thus, in parallel or distributed settings, sketching combined with model averaging is a powerful technique that quickly obtains near-optimal solutions to the MRR problem while greatly mitigating the increased statistical risk incurred by sketching. [ABSTRACT FROM AUTHOR]
- Published
- 2018
23. Distributed Semi-supervised Learning with Kernel Ridg Regression.
- Author
-
Xiangyu Chang, Shao-Bo Lin, and Ding-Xuan Zhou
- Subjects
- *
KERNEL functions , *APPROXIMATION error , *REGRESSION analysis , *HILBERT space , *BANACH spaces - Abstract
This paper provides error analysis for distributed semi-supervised learning with kernel ridge regression (DSKRR) based on a divide-and-conquer strategy. DSKRR applies kernel ridge regression (KRR) to data subsets that are distributively stored on multiple servers to produce individual output functions, and then takes a weighted average of the individual output functions as a final estimator. Using a novel error decomposition which divides the generalization error of DSKRR into the approximation error, sample error and distributed error, we find that the sample error and distributed error reect the power and limitation of DSKRR, compared with KRR processing the whole data. Thus a small distributed error provides a large range of the number of data subsets to guarantee a small generalization error. Our results show that unlabeled data play important roles in reducing the distributed error and enlarging the number of data subsets in DSKRR. Our analysis also applies to the case when the regression function is out of the reproducing kernel Hilbert space. Numerical experiments including toy simulations and a music-prediction task are employed to demonstrate our theoretical statements and show the power of unlabeled data in distributed learning. [ABSTRACT FROM AUTHOR]
- Published
- 2017
24. Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models.
- Author
-
Opper, Manfred, Paquet, Ulrich, and Winther, Ole
- Subjects
- *
GAUSSIAN processes , *MACHINE learning , *APPROXIMATION theory , *PERTURBATION theory , *APPROXIMATION error , *ISING model , *POLYNOMIALS - Abstract
Expectation Propagation (EP) provides a framework for approximate inference. When the model under consideration is over a latent Gaussian field, with the approximation being Gaussian, we show how these approximations can systematically be corrected. A perturbative expansion is made of the exact but intractable correction, and can be applied to the model's partition function and other moments of interest. The correction is expressed over the higher-order cumulants which are neglected by EP's local matching of moments. Through the expansion, we see that EP is correct to first order. By considering higher orders, corrections of increasing polynomial complexity can be applied to the approximation. The second order provides a correction in quadratic time, which we apply to an array of Gaussian process and Ising models. The corrections generalize to arbitrarily complex approximating families, which we illustrate on tree-structured Ising model approximations. Furthermore, they provide a polynomial-time assessment of the approximation error. We also provide both theoretical and practical insights on the exactness of the EP solution. [ABSTRACT FROM AUTHOR]
- Published
- 2013
25. Learning Theory Approach to Minimum Error Entropy Criterion.
- Author
-
Ting Hu, Jun Fan, Qiang Wu, and Ding-Xuan Zhou
- Subjects
- *
MACHINE learning , *ENTROPY (Information theory) , *COMPUTER algorithms , *APPROXIMATION error , *GENERALIZATION , *STOCHASTIC convergence - Abstract
We consider the minimum error entropy (MEE) criterion and an empirical risk minimization learning algorithm when an approximation of Rényi's entropy (of order 2) by Parzen windowing is minimized. This learning algorithm involves a Parzen windowing scaling parameter. We present a learning theory approach for this MEE algorithm in a regression setting when the scaling parameter is large. Consistency and explicit convergence rates are provided in terms of the approximation ability and capacity of the involved hypothesis space. Novel analysis is carried out for the generalization error associated with Rényi's entropy and a Parzen windowing function, to overcome technical difficulties arising from the essential differences between the classical least squares problems and the MEE setting. An involved symmetrized least squares error is introduced and analyzed, which is related to some ranking algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2013
26. Learning Theory Approach to Minimum Error Entropy Criterion.
- Author
-
Ting Hu, Jun Fan, Qiang Wu, and Ding-Xuan Zhou
- Subjects
- *
MINIMUM entropy method , *MACHINE learning , *APPROXIMATION error , *ALGORITHMS , *RENYI'S entropy , *STOCHASTIC convergence , *LEAST squares - Abstract
We consider the minimum error entropy (MEE) criterion and an empirical risk minimization learning algorithm when an approximation of Rényi's entropy (of order 2) by Parzen windowing is minimized. This learning algorithm involves a Parzen windowing scaling parameter. We present a learning theory approach for this MEE algorithm in a regression setting when the scaling parameter is large. Consistency and explicit convergence rates are provided in terms of the approximation ability and capacity of the involved hypothesis space. Novel analysis is carried out for the generalization error associated with Rényi's entropy and a Parzen windowing function, to overcome technical difficulties arising from the essential differences between the classical least squares problems and the MEE setting. An involved symmetrized least squares error is introduced and analyzed, which is related to some ranking algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.