Author: "Andrea Montanari" / Publisher: institute of mathematical statistics - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Andrea Montanari"' showing total 6 results

Start Over Author "Andrea Montanari" Publisher institute of mathematical statistics

6 results on '"Andrea Montanari"'

1. Linearized two-layers neural networks in high dimension

Author: Andrea Montanari, Behrooz Ghorbani, Theodor Misiakiewicz, and Song Mei
Subjects: FOS: Computer and information sciences, Statistics and Probability, Computer Science - Machine Learning, Pure mathematics, Polynomial, Mathematics - Statistics Theory, Statistics Theory (math.ST), 02 engineering and technology, Function (mathematics), 01 natural sciences, Upper and lower bounds, Regularization (mathematics), Square (algebra), Machine Learning (cs.LG), 010104 statistics & probability, Kernel method, Dimension (vector space), FOS: Mathematics, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 0101 mathematics, Statistics, Probability and Uncertainty, Invariant (mathematics), Mathematics
Abstract: We consider the problem of learning an unknown function $f_{\star}$ on the $d$-dimensional sphere with respect to the square loss, given i.i.d. samples $\{(y_i,{\boldsymbol x}_i)\}_{i\le n}$ where ${\boldsymbol x}_i$ is a feature vector uniformly distributed on the sphere and $y_i=f_{\star}({\boldsymbol x}_i)+\varepsilon_i$. We study two popular classes of models that can be regarded as linearizations of two-layers neural networks around a random initialization: the random features model of Rahimi-Recht (RF); the neural tangent kernel model of Jacot-Gabriel-Hongler (NT). Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$. We consider two specific regimes: the approximation-limited regime, in which $n=\infty$ while $d$ and $N$ are large but finite; and the sample size-limited regime in which $N=\infty$ while $d$ and $n$ are large but finite. In the first regime we prove that if $d^{\ell + \delta} \le N\le d^{\ell+1-\delta}$ for small $\delta > 0$, then \RF\, effectively fits a degree-$\ell$ polynomial in the raw features, and \NT\, fits a degree-$(\ell+1)$ polynomial. In the second regime, both RF and NT reduce to kernel methods with rotationally invariant kernels. We prove that, if the number of samples is $d^{\ell + \delta} \le n \le d^{\ell +1-\delta}$, then kernel methods can fit at most a a degree-$\ell$ polynomial in the raw features. This lower bound is achieved by kernel ridge regression. Optimal prediction error is achieved for vanishing ridge regularization., Comment: 65 pages; 17 pdf figures
Published: 2021
Full Text: View/download PDF

2. Discussion of: 'Nonparametric regression using deep neural networks with ReLU activation function'

Author: Song Mei, Behrooz Ghorbani, Andrea Montanari, and Theodor Misiakiewicz
Subjects: Statistics and Probability, business.industry, Activation function, Deep neural networks, Artificial intelligence, Statistics, Probability and Uncertainty, business, Mathematics, Nonparametric regression
Published: 2020
Full Text: View/download PDF

3. The landscape of empirical risk for nonconvex losses

Author: Andrea Montanari, Song Mei, and Yu Bai
Subjects: Statistics and Probability, Hessian matrix, Uniform convergence, Population, 02 engineering and technology, uniform convergence, 01 natural sciences, Robust regression, 010104 statistics & probability, symbols.namesake, 0202 electrical engineering, electronic engineering, information engineering, Applied mathematics, 62J02, Empirical risk minimization, 0101 mathematics, education, Empirical process, Mathematics, education.field_of_study, empirical risk minimization, 020206 networking & telecommunications, Function (mathematics), landscape, Stationary point, Nonconvex optimization, symbols, Statistics, Probability and Uncertainty, 62F10, 62H30
Abstract: Most high-dimensional estimation methods propose to minimize a cost function (empirical risk) that is a sum of losses associated to each data point (each example). In this paper, we focus on the case of nonconvex losses. Classical empirical process theory implies uniform convergence of the empirical (or sample) risk to the population risk. While under additional assumptions, uniform convergence implies consistency of the resulting M-estimator, it does not ensure that the latter can be computed efficiently. ¶ In order to capture the complexity of computing M-estimators, we study the landscape of the empirical risk, namely its stationary points and their properties. We establish uniform convergence of the gradient and Hessian of the empirical risk to their population counterparts, as soon as the number of samples becomes larger than the number of unknown parameters (modulo logarithmic factors). Consequently, good properties of the population risk can be carried to the empirical risk, and we are able to establish one-to-one correspondence of their stationary points. We demonstrate that in several problems such as nonconvex binary classification, robust regression and Gaussian mixture model, this result implies a complete characterization of the landscape of the empirical risk, and of the convergence properties of descent algorithms. ¶ We extend our analysis to the very high-dimensional setting in which the number of parameters exceeds the number of samples, and provides a characterization of the empirical risk landscape under a nearly information-theoretically minimal condition. Namely, if the number of samples exceeds the sparsity of the parameters vector (modulo logarithmic factors), then a suitable uniform convergence result holds. We apply this result to nonconvex binary classification and robust regression in very high-dimension.
Published: 2018
Full Text: View/download PDF

4. Online rules for control of false discovery rate and false discovery exceedance

Author: Andrea Montanari and Adel Javanmard
Subjects: FOS: Computer and information sciences, 0301 basic medicine, Statistics and Probability, False discovery rate, Class (set theory), Mathematics - Statistics Theory, Machine Learning (stat.ML), Statistics Theory (math.ST), Scientific field, Statistics - Applications, 01 natural sciences, Machine Learning (cs.LG), online decision making, Methodology (stat.ME), Combinatorics, 010104 statistics & probability, 03 medical and health sciences, Statistics - Machine Learning, false discovery rate (FDR), FOS: Mathematics, Statistical inference, False positive paradox, 62F05, Applications (stat.AP), 62F03, 0101 mathematics, Control (linguistics), Statistics - Methodology, Statistical hypothesis testing, Mathematics, false discovery exceedance (FDX), Computer Science - Learning, Hypothesis testing, 030104 developmental biology, Multiple comparisons problem, 62L99, Statistics, Probability and Uncertainty
Abstract: Multiple hypothesis testing is a core problem in statistical inference and arises in almost every scientific field. Given a set of null hypotheses $\mathcal{H}(n) = (H_1,\dotsc, H_n)$, Benjamini and Hochberg introduced the false discovery rate (FDR), which is the expected proportion of false positives among rejected null hypotheses, and proposed a testing procedure that controls FDR below a pre-assigned significance level. Nowadays FDR is the criterion of choice for large scale multiple hypothesis testing. In this paper we consider the problem of controlling FDR in an "online manner". Concretely, we consider an ordered --possibly infinite-- sequence of null hypotheses $\mathcal{H} = (H_1,H_2,H_3,\dots )$ where, at each step $i$, the statistician must decide whether to reject hypothesis $H_i$ having access only to the previous decisions. This model was introduced by Foster and Stine. We study a class of "generalized alpha-investing" procedures and prove that any rule in this class controls online FDR, provided $p$-values corresponding to true nulls are independent from the other $p$-values. (Earlier work only established mFDR control.) Next, we obtain conditions under which generalized alpha-investing controls FDR in the presence of general $p$-values dependencies. Finally, we develop a modified set of procedures that also allow to control the false discovery exceedance (the tail of the proportion of false discoveries). Numerical simulations and analytical results indicate that online procedures do not incur a large loss in statistical power with respect to offline approaches, such as Benjamini-Hochberg., Comment: 44 pages, 9 figures, to appear in Annals of Statistics
Published: 2018
Full Text: View/download PDF

5. Factor models on locally tree-like graphs

Author: Amir Dembo, Nike Sun, and Andrea Montanari
Subjects: FOS: Computer and information sciences, Statistics and Probability, Pure mathematics, Dense graph, Discrete Mathematics (cs.DM), 05C80, 82B20, Factor models, 01 natural sciences, free energy density, 010104 statistics & probability, symbols.namesake, FOS: Mathematics, Potts model, Uniqueness, 0101 mathematics, Gibbs measure, random graphs, Gibbs measures, belief propagation, Mathematics, Random graph, Probability (math.PR), 010102 general mathematics, Tree (graph theory), independent set, Unimodular matrix, local weak convergence, 60K35, Independent set, Bethe measures, symbols, 82B23, Statistics, Probability and Uncertainty, Mathematics - Probability, Computer Science - Discrete Mathematics
Abstract: We consider homogeneous factor models on uniformly sparse graph sequences converging locally to a (unimodular) random tree $T$, and study the existence of the free energy density $\phi$, the limit of the log-partition function divided by the number of vertices $n$ as $n$ tends to infinity. We provide a new interpolation scheme and use it to prove existence of, and to explicitly compute, the quantity $\phi$ subject to uniqueness of a relevant Gibbs measure for the factor model on $T$. By way of example we compute $\phi$ for the independent set (or hard-core) model at low fugacity, for the ferromagnetic Ising model at all parameter values, and for the ferromagnetic Potts model with both weak enough and strong enough interactions. Even beyond uniqueness regimes our interpolation provides useful explicit bounds on $\phi$. In the regimes in which we establish existence of the limit, we show that it coincides with the Bethe free energy functional evaluated at a suitable fixed point of the belief propagation (Bethe) recursions on $T$. In the special case that $T$ has a Galton-Watson law, this formula coincides with the nonrigorous "Bethe prediction" obtained by statistical physicists using the "replica" or "cavity" methods. Thus our work is a rigorous generalization of these heuristic calculations to the broader class of sparse graph sequences converging locally to trees. We also provide a variational characterization for the Bethe prediction in this general setting, which is of independent interest., Comment: Published in at http://dx.doi.org/10.1214/12-AOP828 the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Published: 2013
Full Text: View/download PDF

6. Finite size scaling for the core of large random hypergraphs

Author: Andrea Montanari, Amir Dembo, Laboratoire de Physique Théorique de l'ENS (LPTENS), Université Pierre et Marie Curie - Paris 6 (UPMC)-Fédération de recherche du Département de physique de l'Ecole Normale Supérieure - ENS Paris (FRDPENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS), Department of Electrical Engineering and Statistics, Stanford University, Laboratoire de Physique Théorique de l'ENS [École Normale Supérieure] (LPTENS), Fédération de recherche du Département de physique de l'Ecole Normale Supérieure - ENS Paris (FRDPENS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Statistics and Probability, Hypergraph, 05C80, 68R10, Vertex cover, 0102 computer and information sciences, random hypergraph, low-density parity-check codes, 01 natural sciences, 94A29, Combinatorics, 010104 statistics & probability, Quadratic equation, XOR-SAT, [MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO], FOS: Mathematics, Mathematics - Combinatorics, finite-size scaling, 0101 mathematics, Scaling, Brownian motion, Mathematics, 05C80, 60J10, 60F17 (Primary) 68R10, 94A29 (Secondary), Probability (math.PR), Satisfiability, Vertex (geometry), [MATH.MATH-PR]Mathematics [math]/Probability [math.PR], 60F17, 010201 computation theory & mathematics, 60J10, Combinatorial optimization, Core, Combinatorics (math.CO), Statistics, Probability and Uncertainty, Mathematics - Probability, random graph
Abstract: The (two) core of a hypergraph is the maximal collection of hyperedges within which no vertex appears only once. It is of importance in tasks such as efficiently solving a large linear system over GF[2], or iterative decoding of low-density parity-check codes used over the binary erasure channel. Similar structures emerge in a variety of NP-hard combinatorial optimization and decision problems, from vertex cover to satisfiability. For a uniformly chosen random hypergraph of $m=n\rho$ vertices and $n$ hyperedges, each consisting of the same fixed number $l\geq3$ of vertices, the size of the core exhibits for large $n$ a first-order phase transition, changing from $o(n)$ for $\rho>\rho _{\mathrm{c}}$ to a positive fraction of $n$ for $\rho0$. Analyzing the corresponding ``leaf removal'' algorithm, we determine the associated finite-size scaling behavior. In particular, if $\rho$ is inside the scaling window (more precisely, $\rho=\rho_{\mathrm{c}}+rn^{-1/2}$), the probability of having a core of size $\Theta(n)$ has a limit strictly between 0 and 1, and a leading correction of order $\Theta(n^{-1/6})$. The correction admits a sharp characterization in terms of the distribution of a Brownian motion with quadratic shift, from which it inherits the scaling with $n$. This behavior is expected to be universal for a wide collection of combinatorial problems., Comment: Published in at http://dx.doi.org/10.1214/07-AAP514 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Published: 2008
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Andrea Montanari"'

1. Linearized two-layers neural networks in high dimension

2. Discussion of: 'Nonparametric regression using deep neural networks with ReLU activation function'

3. The landscape of empirical risk for nonconvex losses

4. Online rules for control of false discovery rate and false discovery exceedance

5. Factor models on locally tree-like graphs

6. Finite size scaling for the core of large random hypergraphs

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

6 results on '"Andrea Montanari"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources