Author: "Zdeborová, Lenka" - Searchworks@Jio Institute Digital Library Search Results

1. Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens

Author: Erba, Vittorio, Troiani, Emanuele, Biggio, Luca, Maillard, Antoine, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Machine Learning
Abstract: Current progress in artificial intelligence is centered around so-called large language models that consist of neural networks processing long sequences of high-dimensional vectors called tokens. Statistical physics provides powerful tools to study the functioning of learning with neural networks and has played a recognized role in the development of modern machine learning. The statistical physics approach relies on simplified and analytically tractable models of data. However, simple tractable models for long sequences of high-dimensional tokens are largely underexplored. Inspired by the crucial role models such as the single-layer teacher-student perceptron (aka generalized linear regression) played in the theory of fully connected neural networks, in this paper, we introduce and study the bilinear sequence regression (BSR) as one of the most basic models for sequences of tokens. We note that modern architectures naturally subsume the BSR model due to the skip connections. Building on recent methodological progress, we compute the Bayes-optimal generalization error for the model in the limit of long sequences of high-dimensional tokens, and provide a message-passing algorithm that matches this performance. We quantify the improvement that optimal learning brings with respect to vectorizing the sequence of tokens and learning via simple linear regression. We also unveil surprising properties of the gradient descent algorithms in the BSR model.
Published: 2024

2. Building Conformal Prediction Intervals with Approximate Message Passing

Author: Clarté, Lucas and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Machine Learning
Abstract: Conformal prediction has emerged as a powerful tool for building prediction intervals that are valid in a distribution-free way. However, its evaluation may be computationally costly, especially in the high-dimensional setting where the dimensionality and sample sizes are both large and of comparable magnitudes. To address this challenge in the context of generalized linear regression, we propose a novel algorithm based on Approximate Message Passing (AMP) to accelerate the computation of prediction intervals using full conformal prediction, by approximating the computation of conformity scores. Our work bridges a gap between modern uncertainty quantification techniques and tools for high-dimensional problems involving the AMP algorithm. We evaluate our method on both synthetic and real data, and show that it produces prediction intervals that are close to the baseline methods, while being orders of magnitude faster. Additionally, in the high-dimensional limit and under assumptions on the data distribution, the conformity scores computed by AMP converge to the one computed exactly, which allows theoretical study and benchmarking of conformal methods in high dimensions.
Published: 2024

3. Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

Author: Borges, Beatriz, Foroutan, Negar, Bayazit, Deniz, Sotnikova, Anna, Montariol, Syrielle, Nazaretzky, Tanya, Banaei, Mohammadreza, Sakhaeirad, Alireza, Servant, Philippe, Neshaei, Seyed Parsa, Frej, Jibril, Romanou, Angelika, Weiss, Gail, Mamooler, Sepideh, Chen, Zeming, Fan, Simin, Gao, Silin, Ismayilzada, Mete, Paul, Debjit, Schöpfer, Alexandre, Janchevski, Andrej, Tiede, Anja, Linden, Clarence, Troiani, Emanuele, Salvi, Francesco, Behrens, Freya, Orsi, Giacomo, Piccioli, Giovanni, Sevel, Hadrien, Coulon, Louis, Pineros-Rodriguez, Manuela, Bonnassies, Marin, Hellich, Pierre, van Gerwen, Puck, Gambhir, Sankalp, Pirelli, Solal, Blanchard, Thomas, Callens, Timothée, Aoun, Toni Abi, Alonso, Yannick Calvino, Cho, Yuri, Chiappa, Alberto, Sclocchi, Antonio, Bruno, Étienne, Hofhammer, Florian, Pescia, Gabriel, Rizk, Geovani, Dadi, Leello, Stoffl, Lucas, Ribeiro, Manoel Horta, Bovel, Matthieu, Pan, Yueyang, Radenovic, Aleksandra, Alahi, Alexandre, Mathis, Alexander, Bitbol, Anne-Florence, Faltings, Boi, Hébert, Cécile, Tuia, Devis, Maréchal, François, Candea, George, Carleo, Giuseppe, Chappelier, Jean-Cédric, Flammarion, Nicolas, Fürbringer, Jean-Marie, Pellet, Jean-Philippe, Aberer, Karl, Zdeborová, Lenka, Salathé, Marcel, Jaggi, Martin, Rajman, Martin, Payer, Mathias, Wyart, Matthieu, Gastpar, Michael, Ceriotti, Michele, Svensson, Ola, Lévêque, Olivier, Ienne, Paolo, Guerraoui, Rachid, West, Robert, Kashyap, Sanidhya, Piazza, Valerio, Simanis, Viesturs, Kuncak, Viktor, Cevher, Volkan, Schwaller, Philippe, Friedli, Sacha, Jermann, Patrick, Kaser, Tanja, and Bosselut, Antoine
Subjects: Computer Science - Computers and Society, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens of vulnerability, the potential for university assessments and learning outcomes to be impacted by student use of generative AI. We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level STEM courses. Specifically, we compile a novel dataset of textual assessment questions from 50 courses at EPFL and evaluate whether two AI assistants, GPT-3.5 and GPT-4 can adequately answer these questions. We use eight prompting strategies to produce responses and find that GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions. When grouping courses in our dataset by degree program, these systems already pass non-project assessments of large numbers of core courses in various degree programs, posing risks to higher education accreditation that will be amplified as these models improve. Our results call for revising program-level assessment design in higher education in light of advances in generative AI., Comment: 20 pages, 8 figures
Published: 2024

4. Bayes-optimal learning of an extensive-width neural network from quadratically many samples

Author: Maillard, Antoine, Troiani, Emanuele, Martin, Simon, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Information Theory, Computer Science - Machine Learning, Mathematics - Probability
Abstract: We consider the problem of learning a target function corresponding to a single hidden layer neural network, with a quadratic activation function after the first layer, and random weights. We consider the asymptotic limit where the input dimension and the network width are proportionally large. Recent work [Cui & al '23] established that linear regression provides Bayes-optimal test error to learn such a function when the number of available samples is only linear in the dimension. That work stressed the open challenge of theoretically analyzing the optimal test error in the more interesting regime where the number of samples is quadratic in the dimension. In this paper, we solve this challenge for quadratic activations and derive a closed-form expression for the Bayes-optimal test error. We also provide an algorithm, that we call GAMP-RIE, which combines approximate message passing with rotationally invariant matrix denoising, and that asymptotically achieves the optimal performance. Technically, our result is enabled by establishing a link with recent works on optimal denoising of extensive-rank matrices and on the ellipsoid fitting problem. We further show empirically that, in the absence of noise, randomly-initialized gradient descent seems to sample the space of weights, leading to zero training loss, and averaging over initialization leads to a test error equal to the Bayes-optimal one., Comment: 47 pages
Published: 2024

5. The phase diagram of compressed sensing with $\ell_0$-norm regularization

Author: Barbier, Damien, Lucibello, Carlo, Saglietti, Luca, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Computer Science - Information Theory, Condensed Matter - Disordered Systems and Neural Networks
Abstract: Noiseless compressive sensing is a two-steps setting that allows for undersampling a sparse signal and then reconstructing it without loss of information. The LASSO algorithm, based on $\lone$ regularization, provides an efficient and robust to address this problem, but it fails in the regime of very high compression rate. Here we present two algorithms based on $\lzero$-norm regularization instead that outperform the LASSO in terms of compression rate in the Gaussian design setting for measurement matrix. These algorithms are based on the Approximate Survey Propagation, an algorithmic family within the Approximate Message Passing class. In the large system limit, they can be rigorously tracked through State Evolution equations and it is possible to exactly predict the range compression rates for which perfect signal reconstruction is possible. We also provide a statistical physics analysis of the $\lzero$-norm noiseless compressive sensing model. We show the existence of both a replica symmetric state and a 1-step replica symmmetry broken (1RSB) state for sufficiently low $\lzero$-norm regularization. The recovery limits of our algorithms are linked to the behavior of the 1RSB solution., Comment: arXiv admin note: substantial text overlap with arXiv:2304.12127
Published: 2024

6. Counting in Small Transformers: The Delicate Interplay between Attention and Feed-Forward Layers

Author: Behrens, Freya, Biggio, Luca, and Zdeborová, Lenka
Subjects: Computer Science - Machine Learning
Abstract: How do different architectural design choices influence the space of solutions that a transformer can implement and learn? How do different components interact with each other to shape the model's hypothesis space? We investigate these questions by characterizing the solutions simple transformer blocks can implement when challenged to solve the histogram task -- counting the occurrences of each item in an input sequence from a fixed vocabulary. Despite its apparent simplicity, this task exhibits a rich phenomenology: our analysis reveals a strong inter-dependence between the model's predictive performance and the vocabulary and embedding sizes, the token-mixing mechanism and the capacity of the feed-forward block. In this work, we characterize two different counting strategies that small transformers can implement theoretically: relation-based and inventory-based counting, the latter being less efficient in computation and memory. The emergence of either strategy is heavily influenced by subtle synergies among hyperparameters and components, and depends on seemingly minor architectural tweaks like the inclusion of softmax in the attention mechanism. By introspecting models trained on the histogram task, we verify the formation of both mechanisms in practice. Our findings highlight that even in simple settings, slight variations in model design can cause significant changes to the solutions a transformer learns., Comment: 33 pages, Accepted at Mechanistic Interpretability Workshop, ICML 2024
Published: 2024

7. Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions

Author: Keup, Christian and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Machine Learning
Abstract: This work explores multi-modal inference in a high-dimensional simplified model, analytically quantifying the performance gain of multi-modal inference over that of analyzing modalities in isolation. We present the Bayes-optimal performance and weak recovery thresholds in a model where the objective is to recover the latent structures from two noisy data matrices with correlated spikes. The paper derives the approximate message passing (AMP) algorithm for this model and characterizes its performance in the high-dimensional limit via the associated state evolution. The analysis holds for a broad range of priors and noise channels, which can differ across modalities. The linearization of AMP is compared numerically to the widely used partial least squares (PLS) and canonical correlation analysis (CCA) methods, which are both observed to suffer from a sub-optimal recovery threshold.
Published: 2024

8. Counting and Hardness-of-Finding Fixed Points in Cellular Automata on Random Graphs

Author: Koller, Cédric, Behrens, Freya, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks, Condensed Matter - Statistical Mechanics
Abstract: We study the fixed points of outer-totalistic cellular automata on sparse random regular graphs. These can be seen as constraint satisfaction problems, where each variable must adhere to the same local constraint, which depends solely on its state and the total number of its neighbors in each possible state. Examples of this setting include classical problems such as independent sets or assortative/dissasortative partitions. We analyse the existence and number of fixed points in the large system limit using the cavity method, under both the replica symmetric (RS) and one-step replica symmetry breaking (1RSB) assumption. This method allows us to characterize the structure of the space of solutions, in particular, if the solutions are clustered and whether the clusters contain frozen variables. This last property is conjectured to be linked to the typical algorithmic hardness of the problem. We bring experimental evidence for this claim by studying the performance of the belief-propagation reinforcement algorithm, a message-passing-based solver for these constraint satisfaction problems., Comment: 26 pages, 7 figures
Published: 2024

9. Fundamental computational limits of weak learnability in high-dimensional multi-index models

Author: Troiani, Emanuele, Dandi, Yatin, Defilippis, Leonardo, Zdeborová, Lenka, Loureiro, Bruno, and Krzakala, Florent
Subjects: Computer Science - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Computational Complexity
Abstract: Multi-index models - functions which only depend on the covariates through a non-linear transformation of their projection on a subspace - are a useful benchmark for investigating feature learning with neural nets. This paper examines the theoretical boundaries of efficient learnability in this hypothesis class, focusing on the minimum sample complexity required for weakly recovering their low-dimensional structure with first-order iterative algorithms, in the high-dimensional regime where the number of samples $n\!=\!\alpha d$ is proportional to the covariate dimension $d$. Our findings unfold in three parts: (i) we identify under which conditions a trivial subspace can be learned with a single step of a first-order algorithm for any $\alpha\!>\!0$; (ii) if the trivial subspace is empty, we provide necessary and sufficient conditions for the existence of an easy subspace where directions that can be learned only above a certain sample complexity $\alpha\!>\!\alpha_c$, where $\alpha_{c}$ marks a computational phase transition. In a limited but interesting set of really hard directions -- akin to the parity problem -- $\alpha_c$ is found to diverge. Finally, (iii) we show that interactions between different directions can result in an intricate hierarchical learning phenomenon, where directions can be learned sequentially when coupled to easier ones. We discuss in detail the grand staircase picture associated to these functions (and contrast it with the original staircase one). Our theory builds on the optimality of approximate message-passing among first-order iterative methods, delineating the fundamental learnability limit across a broad spectrum of algorithms, including neural networks trained with gradient descent, which we discuss in this context.
Published: 2024

10. Integer Traffic Assignment Problem: Algorithms and Insights on Random Graphs

Author: Harfouche, Rayan, Piccioli, Giovanni, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Discrete Mathematics, Mathematics - Optimization and Control, Statistics - Computation
Abstract: Path optimization is a fundamental concern across various real-world scenarios, ranging from traffic congestion issues to efficient data routing over the internet. The Traffic Assignment Problem (TAP) is a classic continuous optimization problem in this field. This study considers the Integer Traffic Assignment Problem (ITAP), a discrete variant of TAP. ITAP involves determining optimal routes for commuters in a city represented by a graph, aiming to minimize congestion while adhering to integer flow constraints on paths. This restriction makes ITAP an NP-hard problem. While conventional TAP prioritizes repulsive interactions to minimize congestion, this work also explores the case of attractive interactions, related to minimizing the number of occupied edges. We present and evaluate multiple algorithms to address ITAP, including a message passing algorithm, a greedy approach, simulated annealing, and relaxation of ITAP to TAP. Inspired by studies of random ensembles in the large-size limit in statistical physics, comparisons between these algorithms are conducted on large sparse random regular graphs with a random set of origin-destination pairs. Our results indicate that while the simplest greedy algorithm performs competitively in the repulsive scenario, in the attractive case the message-passing-based algorithm and simulated annealing demonstrate superiority. We then investigate the relationship between TAP and ITAP in the repulsive case. We find that, as the number of paths increases, the solution of TAP converges toward that of ITAP, and we investigate the speed of this convergence. Depending on the number of paths, our analysis leads us to identify two scaling regimes: in one the average flow per edge is of order one, and in another the number of paths scales quadratically with the size of the graph, in which case the continuous relaxation solves the integer problem closely., Comment: 37 pages, 15 figures
Published: 2024

11. Quenches in the Sherrington-Kirkpatrick model

Author: Erba, Vittorio, Behrens, Freya, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks
Abstract: The Sherrington-Kirkpatrick (SK) model is a prototype of a complex non-convex energy landscape. Dynamical processes evolving on such landscapes and locally aiming to reach minima are generally poorly understood. Here, we study quenches, i.e. dynamics that locally aim to decrease energy. We analyse the energy at convergence for two distinct algorithmic classes, single-spin flip and synchronous dynamics, focusing on greedy and reluctant strategies. We provide precise numerical analysis of the finite size effects and conclude that, perhaps counter-intuitively, the reluctant algorithm is compatible with converging to the ground state energy density, while the greedy strategy is not. Inspired by the single-spin reluctant and greedy algorithms, we investigate two synchronous time algorithms, the sync-greedy and sync-reluctant algorithms. These synchronous processes can be analysed using dynamical mean field theory (DMFT), and a new backtracking version of DMFT. Notably, this is the first time the backtracking DMFT is applied to study dynamical convergence properties in fully connected disordered models. The analysis suggests that the sync-greedy algorithm can also achieve energies compatible with the ground state, and that it undergoes a dynamical phase transition.
Published: 2024
Full Text: View/download PDF

12. Fundamental limits of Non-Linear Low-Rank Matrix Estimation

Author: Mergny, Pierre, Ko, Justin, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We consider the task of estimating a low-rank matrix from non-linear and noisy observations. We prove a strong universality result showing that Bayes-optimal performances are characterized by an equivalent Gaussian model with an effective prior, whose parameters are entirely determined by an expansion of the non-linear function. In particular, we show that to reconstruct the signal accurately, one requires a signal-to-noise ratio growing as $N^{\frac 12 (1-1/k_F)}$, where $k_F$ is the first non-zero Fisher information coefficient of the function. We provide asymptotic characterization for the minimal achievable mean squared error (MMSE) and an approximate message-passing algorithm that reaches the MMSE under conditions analogous to the linear version of the problem. We also provide asymptotic errors achieved by methods such as principal component analysis combined with Bayesian denoising, and compare them with Bayes-optimal MMSE., Comment: 42 pages, 2 figures
Published: 2024

13. Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression

Author: Clarté, Lucas, Vandenbroucque, Adrien, Dalle, Guillaume, Loureiro, Bruno, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Machine Learning
Abstract: We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, taking the limit where the number of samples $n$ and dimension $d$ of the covariates grow at a comparable fixed rate $\alpha\!=\! n/d$. Our findings are three-fold: i) resampling methods are fraught with problems in high dimensions and exhibit the double-descent-like behavior typical of these situations; ii) only when $\alpha$ is large enough do they provide consistent and reliable error estimations (we give convergence rates); iii) in the over-parametrized regime $\alpha\!<\!1$ relevant to modern machine learning practice, their predictions are not consistent, even with optimal regularization.
Published: 2024

14. Asymptotics of feature learning in two-layer networks after one gradient-step

Author: Cui, Hugo, Pesce, Luca, Dandi, Yatin, Krzakala, Florent, Lu, Yue M., Zdeborová, Lenka, and Loureiro, Bruno
Subjects: Statistics - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Machine Learning
Abstract: In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), we provide an exact asymptotic description of the generalization error of the sRF in the high-dimensional limit where the number of samples, the width, and the input dimension grow at a proportional rate. The resulting characterization for sRFs also captures closely the learning curves of the original network model. This enables us to understand how adapting to the data is crucial for the network to efficiently learn non-linear functions in the direction of the gradient -- where at initialization it can only express linear functions in this regime.
Published: 2024

15. A phase transition between positional and semantic learning in a solvable model of dot-product attention

Author: Cui, Hugo, Behrens, Freya, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Computer Science - Machine Learning
Abstract: Many empirical studies have provided evidence for the emergence of algorithmic mechanisms (abilities) in the learning of language models, that lead to qualitative improvements of the model capabilities. Yet, a theoretical characterization of how such mechanisms emerge remains elusive. In this paper, we take a step in this direction by providing a tight theoretical analysis of the emergence of semantic attention in a solvable model of dot-product attention. More precisely, we consider a non-linear self-attention layer with trainable tied and low-rank query and key matrices. In the asymptotic limit of high-dimensional data and a comparably large number of training samples we provide a tight closed-form characterization of the global minimum of the non-convex empirical loss landscape. We show that this minimum corresponds to either a positional attention mechanism (with tokens attending to each other based on their respective positions) or a semantic attention mechanism (with tokens attending to each other based on their meaning), and evidence an emergent phase transition from the former to the latter with increasing sample complexity. Finally, we compare the dot-product attention layer to a linear positional baseline, and show that it outperforms the latter using the semantic mechanism provided it has access to sufficient data.
Published: 2024

16. The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

Author: Dandi, Yatin, Troiani, Emanuele, Arnaboldi, Luca, Pesce, Luca, Zdeborová, Lenka, and Krzakala, Florent
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable compared to single-pass gradient descent. In particular, multi-pass GD with finite stepsize is found to overcome the limitations of gradient flow and single-pass GD given by the information exponent (Ben Arous et al., 2021) and leap exponent (Abbe et al., 2023) of the target function. We show that upon re-using batches, the network achieves in just two time steps an overlap with the target subspace even for functions not satisfying the staircase property (Abbe et al., 2021). We characterize the (broad) class of functions efficiently learned in finite time. The proof of our results is based on the analysis of the Dynamical Mean-Field Theory (DMFT). We further provide a closed-form description of the dynamical process of the low-dimensional projections of the weights, and numerical experiments illustrating the theory., Comment: Accepted at the International Conference on Machine Learning (ICML), 2024
Published: 2024

17. Dynamical Phase Transitions in Graph Cellular Automata

Author: Behrens, Freya, Hudcová, Barbora, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks
Abstract: Discrete dynamical systems can exhibit complex behaviour from the iterative application of straightforward local rules. A famous example are cellular automata whose global dynamics are notoriously challenging to analyze. To address this, we relax the regular connectivity grid of cellular automata to a random graph, which gives the class of graph cellular automata. Using the dynamical cavity method (DCM) and its backtracking version (BDCM), we show that this relaxation allows us to derive asymptotically exact analytical results on the global dynamics of these systems on sparse random graphs. Concretely, we showcase the results on a specific subclass of graph cellular automata with ``conforming non-conformist'' update rules, which exhibit dynamics akin to opinion formation. Such rules update a node's state according to the majority within their own neighbourhood. In cases where the majority leads only by a small margin over the minority, nodes may exhibit non-conformist behaviour. Instead of following the majority, they either maintain their own state, switch it, or follow the minority. For configurations with different initial biases towards one state we identify sharp dynamical phase transitions in terms of the convergence speed and attractor types. From the perspective of opinion dynamics this answers when consensus will emerge and when two opinions coexist almost indefinitely., Comment: 15 pages
Published: 2023
Full Text: View/download PDF

18. Spectral Phase Transitions in Non-Linear Wigner Spiked Models

Author: Guionnet, Alice, Ko, Justin, Krzakala, Florent, Mergny, Pierre, and Zdeborová, Lenka
Subjects: Mathematics - Probability, 60B20
Abstract: We study the asymptotic behavior of the spectrum of a random matrix where a non-linearity is applied entry-wise to a Wigner matrix perturbed by a rank-one spike with independent and identically distributed entries. In this setting, we show that when the signal-to-noise ratio scale as $N^{\frac{1}{2} (1-1/k_\star)}$, where $k_\star$ is the first non-zero generalized information coefficient of the function, the non-linear spike model effectively behaves as an equivalent spiked Wigner matrix, where the former spike before the non-linearity is now raised to a power $k_\star$. This allows us to study the phase transition of the leading eigenvalues, generalizing part of the work of Baik, Ben Arous and Pech\'e to these non-linear models., Comment: 27 pages
Published: 2023

19. Analysis of learning a flow-based generative model from limited sample complexity

Author: Cui, Hugo, Krzakala, Florent, Vanden-Eijnden, Eric, and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal.
Published: 2023

20. On the Atypical Solutions of the Symmetric Binary Perceptron

Author: Barbier, Damien, Alaoui, Ahmed El, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Mathematics - Probability, Condensed Matter - Disordered Systems and Neural Networks
Abstract: We study the random binary symmetric perceptron problem, focusing on the behavior of rare high-margin solutions. While most solutions are isolated, we demonstrate that these rare solutions are part of clusters of extensive entropy, heuristically corresponding to non-trivial fixed points of an approximate message-passing algorithm. We enumerate these clusters via a local entropy, defined as a Franz-Parisi potential, which we rigorously evaluate using the first and second moment methods in the limit of a small constraint density $\alpha$ (corresponding to vanishing margin $\kappa$) under a certain assumption on the concentration of the entropy. This examination unveils several intriguing phenomena: i) We demonstrate that these clusters have an entropic barrier in the sense that the entropy as a function of the distance from the reference high-margin solution is non-monotone when $\kappa \le 1.429 \sqrt{-\alpha/\log{\alpha}}$, while it is monotone otherwise, and that they have an energetic barrier in the sense that there are no solutions at an intermediate distance from the reference solution when $\kappa \le 1.239 \sqrt{-\alpha/ \log{\alpha}}$. The critical scaling of the margin $\kappa$ in $\sqrt{-\alpha/\log\alpha}$ corresponds to the one obtained from the earlier work of Gamarnik et al. (2022) for the overlap-gap property, a phenomenon known to present a barrier to certain efficient algorithms. ii) We establish using the replica method that the complexity (the logarithm of the number of clusters of such solutions) versus entropy (the logarithm of the number of solutions in the clusters) curves are partly non-concave and correspond to very large values of the Parisi parameter, with the equilibrium being reached when the Parisi parameter diverges., Comment: 26 pages, 6 figures
Published: 2023
Full Text: View/download PDF

21. Sampling with flows, diffusion and autoregressive neural networks: A spin-glass perspective

Author: Ghio, Davide, Dandi, Yatin, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks, Condensed Matter - Statistical Mechanics, Computer Science - Machine Learning
Abstract: Recent years witnessed the development of powerful generative models based on flows, diffusion or autoregressive neural networks, achieving remarkable success in generating data from examples with applications in a broad range of areas. A theoretical analysis of the performance and understanding of the limitations of these methods remain, however, challenging. In this paper, we undertake a step in this direction by analysing the efficiency of sampling by these methods on a class of problems with a known probability distribution and comparing it with the sampling performance of more traditional methods such as the Monte Carlo Markov chain and Langevin dynamics. We focus on a class of probability distribution widely studied in the statistical physics of disordered systems that relate to spin glasses, statistical inference and constraint satisfaction problems. We leverage the fact that sampling via flow-based, diffusion-based or autoregressive networks methods can be equivalently mapped to the analysis of a Bayes optimal denoising of a modified probability measure. Our findings demonstrate that these methods encounter difficulties in sampling stemming from the presence of a first-order phase transition along the algorithm's denoising path. Our conclusions go both ways: we identify regions of parameters where these methods are unable to sample efficiently, while that is possible using standard Monte Carlo or Langevin approaches. We also identify regions where the opposite happens: standard approaches are inefficient while the discussed generative methods work well., Comment: 39 pages, 12 figures
Published: 2023
Full Text: View/download PDF

22. Estimating rank-one matrices with mismatched prior and noise: universality and large deviations

Author: Guionnet, Alice, Ko, Justin, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Mathematics - Probability
Abstract: We prove a universality result that reduces the free energy of rank-one matrix estimation problems in the setting of mismatched prior and noise to the computation of the free energy for a modified Sherrington-Kirkpatrick spin glass. Our main result is an almost sure large deviation principle for the overlaps between the truth signal and the estimator for both the Bayes-optimal and mismatched settings. Through the large deviations principle, we recover the limit of the free energy in mismatched inference problems and the universality of the overlaps., Comment: 54 pages
Published: 2023

23. Gibbs Sampling the Posterior of Neural Networks

Author: Piccioli, Giovanni, Troiani, Emanuele, and Zdeborová, Lenka
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this paper, we study sampling from a posterior derived from a neural network. We propose a new probabilistic model consisting of adding noise at every pre- and post-activation in the network, arguing that the resulting posterior can be sampled using an efficient Gibbs sampler. For small models, the Gibbs sampler attains similar performances as the state-of-the-art Markov chain Monte Carlo (MCMC) methods, such as the Hamiltonian Monte Carlo (HMC) or the Metropolis adjusted Langevin algorithm (MALA), both on real and synthetic data. By framing our analysis in the teacher-student setting, we introduce a thermalization criterion that allows us to detect when an algorithm, when run on data with synthetic labels, fails to sample from the posterior. The criterion is based on the fact that in the teacher-student setting we can initialize an algorithm directly at equilibrium.
Published: 2023
Full Text: View/download PDF

24. High-dimensional Asymptotics of Denoising Autoencoders

Author: Cui, Hugo and Zdeborová, Lenka
Subjects: Computer Science - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Statistics - Machine Learning
Abstract: We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection. We consider the high-dimensional limit where the number of training samples and the input dimension jointly tend to infinity while the number of hidden units remains bounded. We provide closed-form expressions for the denoising mean-squared test error. Building on this result, we quantitatively characterize the advantage of the considered architecture over the autoencoder without the skip connection that relates closely to principal component analysis. We further show that our results accurately capture the learning curves on a range of real data sets.
Published: 2023

25. Maximally-stable Local Optima in Random Graphs and Spin Glasses: Phase Transitions and Universality

Author: Dandi, Yatin, Gamarnik, David, and Zdeborová, Lenka
Subjects: Mathematics - Probability, Mathematical Physics, Mathematics - Combinatorics
Abstract: We provide a unified analysis of stable local optima of Ising spins with Hamiltonians having pair-wise interactions and partitions in random weighted graphs where a large number of vertices possess sufficient single spin-flip stability. For graphs, we consider partitions on random graphs where almost all vertices possess sufficient appropriately defined friendliness/unfriendliness. For spin glasses, we characterize approximate local optima having almost all local magnetic fields of sufficiently large magnitude. For $n$ nodes, as $n \rightarrow \infty$, we prove that the maximum number of vertices possessing such stability undergoes a phase transition from $n-o(n)$ to $n-\Theta(n)$ around a certain value of the stability, proving a conjecture from Behrens et al. [2022].Through a universality argument, we further prove that such a phase transition occurs around the same value of the stability for different choices of interactions, specifically ferromagnetic and anti-ferromagnetic, for sparse graphs, as $n \rightarrow \infty$ in the large degree limit. Furthermore, we show that after appropriate re-scaling, the same value of the threshold characterises such a phase transition for the case of fully connected spin-glass models. Our results also allow the characterization of possible energy values of maximally stable approximate local optima. Our work extends and proves seminal results in statistical physics related to metastable states, in particular, the work of Bray and Moore [1981].
Published: 2023

26. Backtracking Dynamical Cavity Method

Author: Behrens, Freya, Hudcová, Barbora, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks
Abstract: The cavity method is one of the cornerstones of the statistical physics of disordered systems such as spin glasses and other complex systems. It is able to analytically and asymptotically exactly describe the equilibrium properties of a broad range of models. Exact solutions for dynamical, out-of-equilibrium properties of disordered systems are traditionally much harder to obtain. Even very basic questions such as the limiting energy of a fast quench are so far open. The dynamical cavity method partly fills this gap by considering short trajectories and leveraging the static cavity method. However, being limited to a couple of steps forward from the initialization it typically does not capture dynamical properties related to attractors of the dynamics. We introduce the backtracking dynamical cavity method that instead of analysing the trajectory forward from initialization, analyses trajectories that are found by tracking them backward from attractors. We illustrate that this rather elementary twist on the dynamical cavity method leads to new insight into some of the very basic questions about the dynamics of complex disordered systems. This method is as versatile as the cavity method itself and we hence anticipate that our paper will open many avenues for future research of dynamical, out-of-equilibrium, properties in complex systems., Comment: 14 pages
Published: 2023
Full Text: View/download PDF

27. Statistical mechanics of the maximum-average submatrix problem

Author: Erba, Vittorio, Krzakala, Florent, Pérez, Rodrigo, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Information Theory
Abstract: We study the maximum-average submatrix problem, in which given an $N \times N$ matrix $J$ one needs to find the $k \times k$ submatrix with the largest average of entries. We study the problem for random matrices $J$ whose entries are i.i.d. random variables by mapping it to a variant of the Sherrington-Kirkpatrick spin-glass model at fixed magnetization. We characterize analytically the phase diagram of the model as a function of the submatrix average and the size of the submatrix $k$ in the limit $N\to\infty$. We consider submatrices of size $k = m N$ with $0 < m < 1$. We find a rich phase diagram, including dynamical, static one-step replica symmetry breaking and full-step replica symmetry breaking. In the limit of $m \to 0$, we find a simpler phase diagram featuring a frozen 1-RSB phase, where the Gibbs measure is composed of exponentially many pure states each with zero entropy. We discover an interesting phenomenon, reminiscent of the phenomenology of the binary perceptron: there exist efficient algorithms that provably work in the frozen 1-RSB phase.
Published: 2023
Full Text: View/download PDF

28. Expectation consistency for calibration of neural networks

Author: Clarté, Lucas, Loureiro, Bruno, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Despite their incredible performance, it is well reported that deep neural networks tend to be overoptimistic about their prediction confidence. Finding effective and efficient calibration methods for neural networks is therefore an important endeavour towards better uncertainty quantification in deep learning. In this manuscript, we introduce a novel calibration technique named expectation consistency (EC), consisting of a post-training rescaling of the last layer weights by enforcing that the average validation confidence coincides with the average proportion of correct labels. First, we show that the EC method achieves similar calibration performance to temperature scaling (TS) across different neural network architectures and data sets, all while requiring similar validation samples and computational resources. However, we argue that EC provides a principled method grounded on a Bayesian optimality principle known as the Nishimori identity. Next, we provide an asymptotic characterization of both TS and EC in a synthetic setting and show that their performance crucially depends on the target function. In particular, we discuss examples where EC significantly outperforms TS.
Published: 2023

29. Universality laws for Gaussian mixtures in generalized linear models

Author: Dandi, Yatin, Stephan, Ludovic, Krzakala, Florent, Loureiro, Bruno, and Zdeborová, Lenka
Subjects: Mathematics - Statistics Theory, Statistics - Machine Learning
Abstract: Let $(x_{i}, y_{i})_{i=1,\dots,n}$ denote independent samples from a general mixture distribution $\sum_{c\in\mathcal{C}}\rho_{c}P_{c}^{x}$, and consider the hypothesis class of generalized linear models $\hat{y} = F(\Theta^{\top}x)$. In this work, we investigate the asymptotic joint statistics of the family of generalized linear estimators $(\Theta_{1}, \dots, \Theta_{M})$ obtained either from (a) minimizing an empirical risk $\hat{R}_{n}(\Theta;X,y)$ or (b) sampling from the associated Gibbs measure $\exp(-\beta n \hat{R}_{n}(\Theta;X,y))$. Our main contribution is to characterize under which conditions the asymptotic joint statistics of this family depends (on a weak sense) only on the means and covariances of the class conditional features distribution $P_{c}^{x}$. In particular, this allow us to prove the universality of different quantities of interest, such as the training and generalization errors, redeeming a recent line of work in high-dimensional statistics working under the Gaussian mixture hypothesis. Finally, we discuss the applications of our results to different machine learning tasks of interest, such as ensembling and uncertainty
Published: 2023

30. Bayes-optimal Learning of Deep Random Networks of Extensive-width

Author: Cui, Hugo, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Machine Learning
Abstract: We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width are proportionally large. We propose a closed-form expression for the Bayes-optimal test error, for regression and classification tasks. We further compute closed-form expressions for the test errors of ridge regression, kernel and random features regression. We find, in particular, that optimally regularized ridge regression, as well as kernel regression, achieve Bayes-optimal performances, while the logistic loss yields a near-optimal test error for classification. We further show numerically that when the number of samples grows faster than the dimension, ridge and kernel methods become suboptimal, while neural networks achieve test error close to zero from quadratically many samples.
Published: 2023

31. On double-descent in uncertainty quantification in overparametrized models

Author: Clarté, Lucas, Loureiro, Bruno, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Computer Science - Machine Learning
Abstract: Uncertainty quantification is a central challenge in reliable and trustworthy machine learning. Naive measures such as last-layer scores are well-known to yield overconfident estimates in the context of overparametrized neural networks. Several methods, ranging from temperature scaling to different Bayesian treatments of neural networks, have been proposed to mitigate overconfidence, most often supported by the numerical observation that they yield better calibrated uncertainty measures. In this work, we provide a sharp comparison between popular uncertainty measures for binary classification in a mathematically tractable model for overparametrized neural networks: the random features model. We discuss a trade-off between classification accuracy and calibration, unveiling a double descent like behavior in the calibration curve of optimally regularized estimators as a function of overparametrization. This is in contrast with the empirical Bayes method, which we show to be well calibrated in our setting despite the higher generalization error and overparametrization.
Published: 2022

32. Disordered Systems Insights on Computational Hardness

Author: Gamarnik, David, Moore, Cristopher, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Computational Complexity, Mathematics - Probability, Mathematics - Statistics Theory
Abstract: In this review article, we discuss connections between the physics of disordered systems, phase transitions in inference problems, and computational hardness. We introduce two models representing the behavior of glassy systems, the spiked tensor model and the generalized linear model. We discuss the random (non-planted) versions of these problems as prototypical optimization problems, as well as the planted versions (with a hidden solution) as prototypical problems in statistical inference and learning. Based on ideas from physics, many of these problems have transitions where they are believed to jump from easy (solvable in polynomial time) to hard (requiring exponential time). We discuss several emerging ideas in theoretical computer science and statistics that provide rigorous evidence for hardness by proving that large classes of algorithms fail in the conjectured hard regime. This includes the overlap gap property, a particular mathematization of clustering or dynamical symmetry-breaking, which can be used to show that many algorithms that are local or robust to changes in their input fail. We also discuss the sum-of-squares hierarchy, which places bounds on proofs or algorithms that use low-degree polynomials such as standard spectral methods and semidefinite relaxations, including the Sherrington-Kirkpatrick model. Throughout the manuscript, we present connections to the physics of disordered systems and associated replica symmetry breaking properties., Comment: 42 pages
Published: 2022
Full Text: View/download PDF

33. Rigorous dynamical mean field theory for stochastic gradient descent methods

Author: Gerbelot, Cedric, Troiani, Emanuele, Mignacco, Francesca, Krzakala, Florent, and Zdeborova, Lenka
Subjects: Mathematical Physics, Computer Science - Information Theory, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory (DMFT) equations from statistical physics when applied to gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics, and to include non-separable update functions, allowing datasets with non-identity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch-size and with constant learning rates., Comment: 40 pages, 4 figures
Published: 2022
Full Text: View/download PDF

34. Planted matching problems on random hypergraphs

Author: Adomaityte, Urte, Toshniwal, Anshul, Sicuro, Gabriele, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Discrete Mathematics, Computer Science - Information Theory, Mathematics - Probability, Mathematics - Statistics Theory
Abstract: We consider the problem of inferring a matching hidden in a weighted random $k$-hypergraph. We assume that the hyperedges' weights are random and distributed according to two different densities conditioning on the fact that they belong to the hidden matching, or not. We show that, for $k>2$ and in the large graph size limit, an algorithmic first order transition in the signal strength separates a regime in which a complete recovery of the hidden matching is feasible from a regime in which partial recovery is possible. This is in contrast to the $k=2$ case where the transition is known to be continuous. Finally, we consider the case of graphs presenting a mixture of edges and $3$-hyperedges, interpolating between the $k=2$ and the $k=3$ cases, and we study how the transition changes from continuous to first order by tuning the relative amount of edges and hyperedges., Comment: 13 pages, 12 figures
Published: 2022
Full Text: View/download PDF

35. The planted XY model: thermodynamics and inference

Author: Chen, Siyu, Huang, Guanhao, Piccioli, Giovanni, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks, Condensed Matter - Statistical Mechanics, Computer Science - Information Retrieval, Mathematics - Probability, Statistics - Computation
Abstract: In this paper we study a fully connected planted spin glass named the planted XY model. Motivation for studying this system comes both from the spin glass field and the one of statistical inference where it models the angular synchronization problem. We derive the replica symmetric (RS) phase diagram in the temperature, ferromagnetic bias plane using the approximate message passing (AMP) algorithm and its state evolution (SE). While the RS predictions are exact on the Nishimori line (i.e. when the temperature is matched to the ferromagnetic bias), they become inaccurate when the parameters are mismatched, giving rise to a spin glass phase where AMP is not able to converge. To overcome the defects of the RS approximation we carry out a one-step replica symmetry breaking (1RSB) analysis based on the approximate survey propagation (ASP) algorithm. Exploiting the state evolution of ASP, we count the number of metastable states in the measure, derive the 1RSB free entropy and find the behavior of the Parisi parameter throughout the spin glass phase., Comment: 29 pages, 8 figures
Published: 2022
Full Text: View/download PDF

36. Low-rank Matrix Estimation with Inhomogeneous Noise

Author: Guionnet, Alice, Ko, Justin, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Mathematics - Probability, Condensed Matter - Disordered Systems and Neural Networks, Mathematics - Statistics Theory
Abstract: We study low-rank matrix estimation for a generic inhomogeneous output channel through which the matrix is observed. This generalizes the commonly considered spiked matrix model with homogeneous noise to include for instance the dense degree-corrected stochastic block model. We adapt techniques used to study multispecies spin glasses to derive and rigorously prove an expression for the free energy of the problem in the large size limit, providing a framework to study the signal detection thresholds. We discuss an application of this framework to the degree corrected stochastic block models., Comment: 6 figures
Published: 2022

37. Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap

Author: Pesce, Luca, Loureiro, Bruno, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Machine Learning, Mathematics - Probability, Mathematics - Statistics Theory
Abstract: A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity, i.e. when the fraction of non-zero components of the cluster means $\rho$, as well as the ratio $\alpha$ between the number of samples and the dimension are fixed, while the dimension diverges. We identify the information-theoretic threshold below which obtaining a positive correlation with the true cluster means is statistically impossible. Additionally, we investigate the performance of the approximate message passing (AMP) algorithm analyzed via its state evolution, which is conjectured to be optimal among polynomial algorithm for this task. We identify in particular the existence of a statistical-to-computational gap between the algorithm that require a signal-to-noise ratio $\lambda_{\text{alg}} \ge k / \sqrt{\alpha} $ to perform better than random, and the information theoretic threshold at $\lambda_{\text{it}} \approx \sqrt{-k \rho \log{\rho}} / \sqrt{\alpha}$. Finally, we discuss the case of sub-extensive sparsity $\rho$ by comparing the performance of the AMP with other sparsity-enhancing algorithms, such as sparse-PCA and diagonal thresholding., Comment: NeurIPS camera-ready version
Published: 2022

38. Multi-layer State Evolution Under Random Convolutional Design

Author: Daniels, Max, Gerbelot, Cédric, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Computer Science - Information Theory
Abstract: Signal recovery under generative neural network priors has emerged as a promising direction in statistical inference and computational imaging. Theoretical analysis of reconstruction algorithms under generative priors is, however, challenging. For generative priors with fully connected layers and Gaussian i.i.d. weights, this was achieved by the multi-layer approximate message (ML-AMP) algorithm via a rigorous state evolution. However, practical generative priors are typically convolutional, allowing for computational benefits and inductive biases, and so the Gaussian i.i.d. weight assumption is very limiting. In this paper, we overcome this limitation and establish the state evolution of ML-AMP for random convolutional layers. We prove in particular that random convolutional layers belong to the same universality class as Gaussian matrices. Our proof technique is of an independent interest as it establishes a mapping between convolutional matrices and spatially coupled sensing matrices used in coding theory., Comment: Accepted to NeurIPS 2022
Published: 2022

39. Gaussian Universality of Perceptrons with Random Labels

Author: Gerace, Federica, Krzakala, Florent, Loureiro, Bruno, Stephan, Ludovic, and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Machine Learning, Mathematics - Probability, Mathematics - Statistics Theory
Abstract: While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that there is a large universality class of high-dimensional input data for which we obtain the same minimum training loss as for Gaussian data with corresponding data covariance. In the limit of vanishing regularization, we further demonstrate that the training loss is independent of the data covariance. On the theoretical side, we prove this universality for an arbitrary mixture of homogeneous Gaussian clouds. Empirically, we show that the universality holds also for a broad range of real datasets.
Published: 2022
Full Text: View/download PDF

40. Learning curves for the multi-class teacher-student perceptron

Author: Cornacchia, Elisabetta, Mignacco, Francesca, Veiga, Rodrigo, Gerbelot, Cédric, Loureiro, Bruno, and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Machine Learning
Abstract: One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the corresponding multi-class teacher-student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for both the Bayes-optimal and ERM generalisation errors in the high-dimensional regime. For Gaussian teacher weights, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality. In particular, we observe that regularised cross-entropy minimisation yields close-to-optimal accuracy. Instead, for a binary teacher we show that a first-order phase transition arises in the Bayes-optimal performance., Comment: 14 pages + appendix
Published: 2022
Full Text: View/download PDF

41. Optimal denoising of rotationally invariant rectangular matrices

Author: Troiani, Emanuele, Erba, Vittorio, Krzakala, Florent, Maillard, Antoine, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks, Condensed Matter - Statistical Mechanics, Computer Science - Information Theory, Mathematics - Probability
Abstract: In this manuscript we consider denoising of large rectangular matrices: given a noisy observation of a signal matrix, what is the best way of recovering the signal matrix itself? For Gaussian noise and rotationally-invariant signal priors, we completely characterize the optimal denoiser and its performance in the high-dimensional limit, in which the size of the signal matrix goes to infinity with fixed aspects ratio, and under the Bayes optimal setting, that is when the statistician knows how the signal and the observations were generated. Our results generalise previous works that considered only symmetric matrices to the more general case of non-symmetric and rectangular ones. We explore analytically and numerically a particular choice of factorized signal prior that models cross-covariance matrices and the matrix factorization problem. As a byproduct of our analysis, we provide an explicit asymptotic evaluation of the rectangular Harish-Chandra-Itzykson-Zuber integral in a special case.
Published: 2022

42. (Dis)assortative Partitions on Random Regular Graphs

Author: Behrens, Freya, Arpino, Gabriel, Kivva, Yaroslav, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks, Condensed Matter - Statistical Mechanics, Computer Science - Discrete Mathematics, Mathematics - Probability
Abstract: We study the problem of assortative and disassortative partitions on random $d$-regular graphs. Nodes in the graph are partitioned into two non-empty groups. In the assortative partition every node requires at least $H$ of their neighbors to be in their own group. In the disassortative partition they require less than $H$ neighbors to be in their own group. Using the cavity method based on analysis of the Belief Propagation algorithm we establish for which combinations of parameters $(d,H)$ these partitions exist with high probability and for which they do not. For $H>\lceil \frac{d}{2} \rceil $ we establish that the structure of solutions to the assortative partition problems corresponds to the so-called frozen-1RSB. This entails a conjecture of algorithmic hardness of finding these partitions efficiently. For $H \le \lceil \frac{d}{2} \rceil $ we argue that the assortative partition problem is algorithmically easy on average for all $d$. Further we provide arguments about asymptotic equivalence between the assortative partition problem and the disassortative one, going trough a close relation to the problem of single-spin-flip-stable states in spin glasses. In the context of spin glasses, our results on algorithmic hardness imply a conjecture that gapped single spin flip stable states are hard to find which may be a universal reason behind the observation that physical dynamics in glassy systems display convergence to marginal stability., Comment: 21 pages; Corrected usage of the world "planted" in Section 4
Published: 2022
Full Text: View/download PDF

43. Theoretical characterization of uncertainty in high-dimensional linear classification

Author: Clarté, Lucas, Loureiro, Bruno, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Computer Science - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Statistics - Machine Learning
Abstract: Being able to reliably assess not only the \emph{accuracy} but also the \emph{uncertainty} of models' predictions is an important endeavour in modern machine learning. Even if the model generating the data and labels is known, computing the intrinsic uncertainty after learning the model from a limited number of samples amounts to sampling the corresponding posterior probability measure. Such sampling is computationally challenging in high-dimensional problems and theoretical results on heuristic uncertainty estimators in high-dimensions are thus scarce. In this manuscript, we characterise uncertainty for learning from limited number of samples of high-dimensional Gaussian input data and labels generated by the probit model. In this setting, the Bayesian uncertainty (i.e. the posterior marginals) can be asymptotically obtained by the approximate message passing algorithm, bypassing the canonical but costly Monte Carlo sampling of the posterior. We then provide a closed-form formula for the joint statistics between the logistic classifier, the uncertainty of the statistically optimal Bayesian classifier and the ground-truth probit uncertainty. The formula allows us to investigate calibration of the logistic classifier learning from limited amount of samples. We discuss how over-confidence can be mitigated by appropriately regularising.
Published: 2022
Full Text: View/download PDF

44. Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Author: Veiga, Rodrigo, Stephan, Ludovic, Loureiro, Bruno, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Machine Learning
Abstract: Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates., Comment: 20 pages
Published: 2022

45. Error Scaling Laws for Kernel Classification under Source and Capacity Conditions

Author: Cui, Hugo, Loureiro, Bruno, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We consider the problem of kernel classification. While worst-case bounds on the decay rate of the prediction error with the number of samples are known for some classifiers, they often fail to accurately describe the learning curves of real data sets. In this work, we consider the important class of data sets satisfying the standard source and capacity conditions, comprising a number of real data sets as we show numerically. Under the Gaussian design, we derive the decay rates for the misclassification (prediction) error as a function of the source and capacity coefficients. We do so for two standard kernel classification settings, namely margin-maximizing Support Vector Machines (SVM) and ridge classification, and contrast the two methods. We find that our rates tightly describe the learning curves for this class of data sets, and are also observed on real data. Our results can also be seen as an explicit prediction of the exponents of a scaling law for kernel classification that is accurate on some real datasets.
Published: 2022
Full Text: View/download PDF

46. Aligning random graphs with a sub-tree similarity message-passing algorithm

Author: Piccioli, Giovanni, Semerjian, Guilhem, Sicuro, Gabriele, and Zdeborová, Lenka
Subjects: Computer Science - Information Theory, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Discrete Mathematics, Mathematics - Probability
Abstract: The problem of aligning Erd\"os-R\'enyi random graphs is a noisy, average-case version of the graph isomorphism problem, in which a pair of correlated random graphs is observed through a random permutation of their vertices. We study a polynomial time message-passing algorithm devised to solve the inference problem of partially recovering the hidden permutation, in the sparse regime with constant average degrees. We perform extensive numerical simulations to determine the range of parameters in which this algorithm achieves partial recovery. We also introduce a generalized ensemble of correlated random graphs with prescribed degree distributions, and extend the algorithm to this case., Comment: 36 pages, 14 figures, submitted to Journal of Statistical Mechanics: Theory and Experiment. Corrected typos. Modified Figure 1 for clarity. Added references' titles in bibliography. Added definition of "quasi-aligned". Added clarifications about the significance of Nishimori experiments
Published: 2021
Full Text: View/download PDF

47. Perturbative construction of mean-field equations in extensive-rank matrix factorization and denoising

Author: Maillard, Antoine, Krzakala, Florent, Mézard, Marc, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks, Condensed Matter - Statistical Mechanics, Computer Science - Information Theory, Mathematics - Probability
Abstract: Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise function of their matrix product. In the limit where the dimensions of the matrices tend to infinity, but their ratios remain fixed, we expect to be able to derive closed form expressions for the optimal mean squared error on the estimation of the two factors. However, this remains a very involved mathematical and algorithmic problem. A related, but simpler, problem is extensive-rank matrix denoising, where one aims to reconstruct a matrix with extensive but usually small rank from noisy measurements. In this paper, we approach both these problems using high-temperature expansions at fixed order parameters. This allows to clarify how previous attempts at solving these problems failed at finding an asymptotically exact solution. We provide a systematic way to derive the corrections to these existing approximations, taking into account the structure of correlations particular to the problem. Finally, we illustrate our approach in detail on the case of extensive-rank matrix denoising. We compare our results with known optimal rotationally-invariant estimators, and show how exact asymptotic calculations of the minimal error can be performed using extensive-rank matrix integrals., Comment: 30 pages (main text), 25 pages of references and appendices. v2: Adding clarifications and a new result to derive the optimal denoising estimator from the asymptotic free energy. v3: corrections to match the published version
Published: 2021
Full Text: View/download PDF

48. Large Deviations of Semi-supervised Learning in the Stochastic Block Model

Author: Cui, Hugo, Saglietti, Luca, and Zdeborová, Lenka
Subjects: Condensed Matter - Disordered Systems and Neural Networks
Abstract: In community detection on graphs, the semi-supervised learning problem entails inferring the ground-truth membership of each node in a graph, given the connectivity structure and a limited number of revealed node labels. Different subsets of revealed labels can in principle lead to higher or lower information gains and induce different reconstruction accuracies. In the framework of the dense stochastic block model, we employ statistical physics methods to derive a large deviation analysis for this problem, in the high-dimensional limit. This analysis allows the characterization of the fluctuations around the typical behaviour, capturing the effect of correlated label choices and yielding an estimate of their informativeness and their rareness among subsets of the same size. We find theoretical evidence of a non-monotonic relationship between reconstruction accuracy and the free energy associated to the posterior measure of the inference problem. We further discuss possible implications for active learning applications in community detection.
Published: 2021
Full Text: View/download PDF

49. Probing transfer learning with a model of synthetic correlated datasets

Author: Gerace, Federica, Saglietti, Luca, Mannelli, Stefano Sarao, Saxe, Andrew, and Zdeborová, Lenka
Subjects: Computer Science - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks
Abstract: Transfer learning can significantly improve the sample efficiency of neural networks, by exploiting the relatedness between a data-scarce target task and a data-abundant source task. Despite years of successful applications, transfer learning practice often relies on ad-hoc solutions, while theoretical understanding of these procedures is still limited. In the present work, we re-think a solvable model of synthetic data as a framework for modeling correlation between data-sets. This setup allows for an analytic characterization of the generalization performance obtained when transferring the learned feature map from the source to the target task. Focusing on the problem of training two-layer networks in a binary classification setting, we show that our model can capture a range of salient features of transfer learning with real data. Moreover, by exploiting parametric control over the correlation between the two data-sets, we systematically investigate under which conditions the transfer of features is beneficial for generalization.
Published: 2021
Full Text: View/download PDF

50. Learning Gaussian Mixtures with Generalised Linear Models: Precise Asymptotics in High-dimensions

Author: Loureiro, Bruno, Sicuro, Gabriele, Gerbelot, Cédric, Pacco, Alessandro, Krzakala, Florent, and Zdeborová, Lenka
Subjects: Statistics - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Machine Learning
Abstract: Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of $K$ Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM estimator in high-dimensions, extending several previous results about Gaussian mixture classification in the literature. We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, where we study the efficiency of $\ell_1$ penalty with respect to $\ell_2$; b) max-margin multi-class classification, where we characterise the phase transition on the existence of the multi-class logistic maximum likelihood estimator for $K>2$. Finally, we discuss how our theory can be applied beyond the scope of synthetic data, showing that in different cases Gaussian mixtures capture closely the learning curve of classification tasks in real data sets., Comment: 12 pages + 34 pages of Appendix, 10 figures
Published: 2021

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

592 results on '"Zdeborová, Lenka"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources