142 results on '"Andrea Montanari"'
Search Results
2. A Combined Use of Custom-Made Partial Pelvic Replacement and Proximal Femur Megaprosthesis in the Treatment of Severe Bone Loss after Multiple Total Hip Arthroplasty Revisions
- Author
-
Michele Fiore, Azzurra Paolucci, Renato Zunarelli, Marta Bortoli, Andrea Montanari, Andrea Pace, Lorenzo Di Prinzio, Stefania Claudia Parisi, Roberto De Cristofaro, Massimiliano De Paolis, and Andrea Sambri
- Subjects
severe bone loss ,revision hip arthroplasty ,custom-made ,megaprosthesis ,silver ,Medicine - Abstract
Hip arthroplasty failures (either septic or aseptic) often require multiple revisions, thus leading to severe bone defects. The most common reconstruction methods do not allow the management of severe defects. For this reason, in recent years, techniques borrowed from surgical oncology have been applied in the field of revision surgery to deal with both acetabular and femoral bone losses. In this article, two cases of severe bone deficiency following multiple hip arthroplasty revisions that were treated with a custom-made hip prosthesis combined with a proximal femur megaprosthesis are presented. Both implants were silver coated. A review of the literature was conducted to analyze similar cases treated with either a custom-made prosthesis or a proximal femur megaprosthesis. At the 2-year follow-up, all prostheses were in site without clinical or radiographic signs of implant loosening. No postoperative complications occurred. At the last follow-up, both patients resumed their daily life activities with an MSTS score of 23 and 21, respectively. The combined approach of a proximal femur megaprosthesis with a custom-made partial pelvic replacement is a solution that allows severe bone deficiency cases to be tackled with good functional results. Additionally, silver coating may help prevent recurrence of infection.
- Published
- 2023
- Full Text
- View/download PDF
3. Phosphaturic Mesenchymal Tumors with or without Phosphate Metabolism Derangements
- Author
-
Andrea Montanari, Maria Giulia Pirini, Ludovica Lotrecchiano, Lorenzo Di Prinzio, and Guido Zavatta
- Subjects
phosphaturic tumor ,tumor-induced osteomalacia ,FGF-23 ,phosphatonin ,surgery ,orthopedics ,Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,RC254-282 - Abstract
Phosphaturic mesenchymal tumors (PMT) are rare neoplasms, which can give rise to a multifaceted syndrome, otherwise called tumor-induced osteomalacia (TIO). Localizing these tumors is crucial to obtain a cure for the phosphate metabolism derangement, which is often the main cause leading the patient to seek medical help, because of invalidating physical and neuromuscular symptoms. A proportion of these tumors is completely silent and may grow unnoticed, unless they become large enough to produce pain or discomfort. FGF-23 can be produced by several benign or malignant PMTs. The phosphate metabolism, radiology and histology of these rare tumors must be collectively assessed by a multidisciplinary team aimed at curing the disease locally and improving patients’ quality of life. This narrative review, authored by multiple specialists of a tertiary care hospital center, will describe endocrine, radiological and histological features of these tumors, as well as present surgical and interventional strategies to manage PMTs.
- Published
- 2023
- Full Text
- View/download PDF
4. Urban environment influences on stress, autonomic reactivity and circadian rhythm: protocol for an ambulatory study of mental health and sleep
- Author
-
Andrea Montanari, Limin Wang, Amit Birenboim, and Basile Chaix
- Subjects
environmental stress ,circadian rhythm ,sleep ,mental health ,wearable sensors ,Public aspects of medicine ,RA1-1270 - Abstract
IntroductionConverging evidence suggests that urban living is associated with an increased likelihood of developing mental health and sleep problems. Although these aspects have been investigated in separate streams of research, stress, autonomic reactivity and circadian misalignment can be hypothesized to play a prominent role in the causal pathways underlining the complex relationship between the urban environment and these two health dimensions. This study aims at quantifying the momentary impact of environmental stressors on increased autonomic reactivity and circadian rhythm, and thereby on mood and anxiety symptoms and sleep quality in the context of everyday urban living.MethodThe present article reports the protocol for a feasibility study that aims at assessing the daily environmental and mobility exposures of 40 participants from the urban area of Jerusalem over 7 days. Every participant will carry a set of wearable sensors while being tracked through space and time with GPS receivers. Skin conductance and heart rate variability will be tracked to monitor participants' stress responses and autonomic reactivity, whereas electroencephalographic signal will be used for sleep quality tracking. Light exposure, actigraphy and skin temperature will be used for ambulatory circadian monitoring. Geographically explicit ecological momentary assessment (GEMA) will be used to assess participants' perception of the environment, mood and anxiety symptoms, sleep quality and vitality. For each outcome variable (sleep quality and mental health), hierarchical mixed models including random effects at the individual level will be used. In a separate analysis, to control for potential unobserved individual-level confounders, a fixed effect at the individual level will be specified for case-crossover analyses (comparing each participant to oneself).ConclusionRecent developments in wearable sensing methods, as employed in our study or with even more advanced methods reviewed in the Discussion, make it possible to gather information on the functioning of neuro-endocrine and circadian systems in a real-world context as a way to investigate the complex interactions between environmental exposures, behavior and health. Our work aims to provide evidence on the health effects of urban stressors and circadian disruptors to inspire potential interventions, municipal policies and urban planning schemes aimed at addressing those factors.
- Published
- 2024
- Full Text
- View/download PDF
5. Micro urban spaces and mental well-being: Measuring the exposure to urban landscapes along daily mobility paths and their effects on momentary depressive symptomatology among older population
- Author
-
Giovanna Fancello, Julie Vallée, Cédric Sueur, Frank J. van Lenthe, Yan Kestens, Andrea Montanari, and Basile Chaix
- Subjects
Daily mobility ,Mental health ,Depression ,GPS ,Ecological momentary assessment ,Urban environment ,Environmental sciences ,GE1-350 - Abstract
The urban environment plays an important role for the mental health of residents. Researchers mainly focus on residential neighbourhoods as exposure context, leaving aside the effects of non-residential environments. In order to consider the daily experience of urban spaces, a people-based approach focused on mobility paths is needed. Applying this approach, (1) this study investigated whether individuals’ momentary mental well-being is related to the exposure to micro-urban spaces along the daily mobility paths within the two previous hours; (2) it explored whether these associations differ when environmental exposures are defined considering all location points or only outdoor location points; and (3) it examined the associations between the types of activity and mobility and momentary depressive symptomatology. Using a geographically-explicit ecological momentary assessment approach (GEMA), momentary depressive symptomatology of 216 older adults living in the Ile-de-France region was assessed using smartphone surveys, while participants were tracked with a GPS receiver and an accelerometer for seven days. Exposure to multiple elements of the streetscape was computed within a street network buffer of 25 m of each GPS point over the two hours prior to the questionnaire. Mobility and activity type were documented from a GPS-based mobility survey. We estimated Bayesian generalized mixed effect models with random effects at the individual and day levels and took into account time autocorrelation. We also estimated fixed effects. A better momentary mental wellbeing was observed when residents performed leisure activities or were involved in active mobility and when they were exposed to walkable areas (pedestrian dedicated paths, open spaces, parks and green areas), water elements, and commerce, leisure and cultural attractors over the previous two hours. These relationships were stronger when exposures were defined based only on outdoor location points rather than all location points, and when we considered within-individual differences compared to between-individual differences.
- Published
- 2023
- Full Text
- View/download PDF
6. Does Surgical Approach Influence Complication Rate of Hip Hemiarthroplasty for Femoral Neck Fractures? A Literature Review and Meta-Analysis
- Author
-
Matteo Filippini, Marta Bortoli, Andrea Montanari, Andrea Pace, Lorenzo Di Prinzio, Gianluca Lonardo, Stefania Claudia Parisi, Valentina Persiani, Roberto De Cristofaro, Andrea Sambri, Massimiliano De Paolis, and Michele Fiore
- Subjects
hip hemiarthroplasty ,femoral neck fracture ,postero-lateral approach ,lateral approach ,antero-lateral approach ,anterior approach ,Medicine (General) ,R5-920 - Abstract
Background: Femoral neck fractures are an epidemiologically significant issue with major effects on patients and health care systems, as they account for a large percentage of bone injuries in the elderly. Hip hemiarthroplasty is a common surgical procedure in the treatment of displaced femoral neck fractures. Several surgical approaches may be used to access the hip joint in case of femoral neck fractures, each with its own benefits and potential drawbacks, but none of them has consistently been found to be superior to the others. This article aims to systematically review and compare the different approaches in terms of the complication rate at the last follow-up. Methods: an in-depth search on PubMed/Scopus/Web of Science databases and a cross-referencing search was carried out concerning the articles comparing different approaches in hemiarthroplasty and reporting detailed data. Results: A total of 97,576 hips were included: 1030 treated with a direct anterior approach, 4131 with an anterolateral approach, 59,110 with a direct lateral approach, and 33,007 with a posterolateral approach. Comparing the different approaches, significant differences were found in both the overall complication rate and the rate of revision surgery performed (p < 0.05). In particular, the posterolateral approach showed a significantly higher complication rate than the lateral approach (8.4% vs. 3.2%, p < 0.001). Furthermore, the dislocation rate in the posterolateral group was significantly higher than in the other three groups considered (p < 0.026). However, the posterolateral group showed less blood loss than the anterolateral group (p < 0.001), a lower intraoperative fractures rate than the direct anterior group (p < 0.035), and shorter mean operative time than the direct lateral group (p < 0.018). Conclusions: The posterolateral approach showed a higher complication rate than direct lateral approach and a higher prosthetic dislocation rate than the other three types of surgical approaches. On the other hand, patients treated with posterolateral approach showed better outcomes in other parameters considered, such as mean operative time, mean blood loss and intraoperative fractures rate. The knowledge of the limitations of each approach and the most common associated complications can lead to choosing a surgical technique based on the patient’s individual risk.
- Published
- 2023
- Full Text
- View/download PDF
7. The Industrial Revolution. Landscapes of a divided nation. Un progetto didattico di ricerca-azione in rete per la scuola secondaria di secondo grado
- Author
-
Andrea Montanari
- Subjects
industrial revolution ,landscapes of division ,nuove tecnologie ,WebQuest ,cooperative learning ,inquiry-oriented activity ,ecologia dell’apprendimento ,Education (General) ,L7-991 - Abstract
Among the activities a teacher may adopt in his syllabus, the WebQuest is one of the most effective in creating a good cooperative learning environment based on the constructivist approach. The WebQuest combines inquiry-based authentic material and performance-based tasks that require the use of Internet resources. This article presents a project of English culture for the secondary school developed as a long term WebQuest. After an introduction to the WebQuest learning environment, the article describes step by step the making of the project on the English Industrial Revolution: from the outline and the student group forming, to its final realization and evaluation. A link to read online the five PowerPoints realized by the students is included.
- Published
- 2013
- Full Text
- View/download PDF
8. Prigionieri dimenticati. Italiani nei lager della Grande guerra
- Author
-
Andrea Montanari
- Subjects
Grande guerra ,Cellelager ,prigionieri ,musica ,teatro ,History of Italy ,DG11-999 - Abstract
L’articolo tratta del ciclo di eventi “Prigionieri dimenticati. Italiani nei lager della Grande guerra”, allestiti a Bibbiano (Reggio Emilia) dal 15 al 29 settembre 2013. La prigionia nel campo di Celle (Germania) è al centro di: una mostra, composta da pannelli esplicativi e oggetti provenienti dai lager e dalle trincee; uno spettacolo teatrale intitolato “Sandrone soldato”, scritto proprio a Celle; e un concerto di musiche originali composte e cantate nei lager e nelle trincee. La Grande guerra, dunque, raccontata in modo del tutto originale.
- Published
- 2013
- Full Text
- View/download PDF
9. Stabl: sparse and reliable biomarker discovery in predictive modeling of high-dimensional omic data
- Author
-
Julien Hedou, Ivana Maric, Grégoire Bellan, Jakob Einhaus, Dyani Gaudilliere, Francois-Xavier Ladant, Franck Verdonk, Ina Stelzer, Dorien Feyaerts, Amy Tsai, Edward Ganio, Maximilian Sabayev, Joshua Gillard, Adam Bonham, Masaki Sato, Maïgane Diop, Martin Angst, David Stevenson, Nima Aghaeepour, Andrea Montanari, and Brice Gaudilliere
- Abstract
High-content omic technologies coupled with sparsity-promoting regularization methods (SRM) have transformed the biomarker discovery process. However, the translation of computational results into a clinical use-case scenario remains challenging. A rate-limiting step is the rigorous selection of reliable biomarker candidates among a host of biological features included in multivariate models. We propose Stabl, a machine learning framework that unifies the biomarker discovery process with multivariate predictive modeling of clinical outcomes by selecting a sparse and reliable set of biomarkers. Evaluation of Stabl on synthetic datasets and four independent clinical studies demonstrates improved biomarker sparsity and reliability compared to commonly used SRMs at similar predictive performance. Stabl readily extends to double- and triple-omics integration tasks and identifies a sparser and more reliable set of biomarkers than those selected by state-of-the-art early- and late-fusion SRMs, thereby facilitating the biological interpretation and clinical translation of complex multi-omic predictive models. The complete package for Stabl is available online at https://github.com/gregbellan/Stabl.
- Published
- 2023
- Full Text
- View/download PDF
10. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve
- Author
-
Andrea Montanari and Song Mei
- Subjects
Double descent ,Applied Mathematics ,General Mathematics ,Applied mathematics ,Generalization error ,Regression ,Mathematics - Published
- 2021
- Full Text
- View/download PDF
11. «Guardo gli asini che volano nel ciel». Il viaggio di Stanlio e Ollio in Italia
- Author
-
Andrea Montanari
- Abstract
L’articolo racconta, a partire dal film del 2018 Stan & Ollie, il viaggio in Italia del duo comico più famoso del mondo, datato 1950. Fra il 1947 e il 1954, quando il successo in America iniziava a scemare, Stanlio e Ollio intrapresero una tournée europea in Francia, Inghilterra, Belgio e Danimarca, accolti ovunque in trionfo. Qui, infatti, il loro successo sembrava non aver mai avuto un calo; ovunque andassero, la risposta del pubblico era sempre la stessa: una grande e calorosa accoglienza.
- Published
- 2022
- Full Text
- View/download PDF
12. The Landscape of the Spiked Tensor Model
- Author
-
Song Mei, Andrea Montanari, Gérard Ben Arous, and Mihai Nica
- Subjects
FOS: Computer and information sciences ,Polynomial (hyperelastic model) ,Unit sphere ,Applied Mathematics ,General Mathematics ,Probability (math.PR) ,010102 general mathematics ,Order (ring theory) ,Mathematics - Statistics Theory ,Machine Learning (stat.ML) ,Statistics Theory (math.ST) ,Expected value ,Lambda ,01 natural sciences ,Combinatorics ,010104 statistics & probability ,Statistics - Machine Learning ,Homogeneous polynomial ,FOS: Mathematics ,Ideal (ring theory) ,Tensor ,0101 mathematics ,Mathematics - Probability ,Mathematics - Abstract
We consider the problem of estimating a large rank-one tensor ${\boldsymbol u}^{\otimes k}\in({\mathbb R}^{n})^{\otimes k}$, $k\ge 3$ in Gaussian noise. Earlier work characterized a critical signal-to-noise ratio $\lambda_{Bayes}= O(1)$ above which an ideal estimator achieves strictly positive correlation with the unknown vector of interest. Remarkably no polynomial-time algorithm is known that achieved this goal unless $\lambda\ge C n^{(k-2)/4}$ and even powerful semidefinite programming relaxations appear to fail for $1\ll \lambda\ll n^{(k-2)/4}$. In order to elucidate this behavior, we consider the maximum likelihood estimator, which requires maximizing a degree-$k$ homogeneous polynomial over the unit sphere in $n$ dimensions. We compute the expected number of critical points and local maxima of this objective function and show that it is exponential in the dimensions $n$, and give exact formulas for the exponential growth rate. We show that (for $\lambda$ larger than a constant) critical points are either very close to the unknown vector ${\boldsymbol u}$, or are confined in a band of width $\Theta(\lambda^{-1/(k-1)})$ around the maximum circle that is orthogonal to ${\boldsymbol u}$. For local maxima, this band shrinks to be of size $\Theta(\lambda^{-1/(k-2)})$. These `uninformative' local maxima are likely to cause the failure of optimization algorithms., Comment: 40 pages, 20 pdf figures
- Published
- 2019
- Full Text
- View/download PDF
13. Nonnegative Matrix Factorization Via Archetypal Analysis
- Author
-
Hamid Javadi and Andrea Montanari
- Subjects
Statistics and Probability ,Dimensionality reduction ,05 social sciences ,Regular polygon ,01 natural sciences ,Small set ,Non-negative matrix factorization ,Matrix decomposition ,Combinatorics ,010104 statistics & probability ,Data point ,Archetypal analysis ,0502 economics and business ,Decomposition (computer science) ,0101 mathematics ,Statistics, Probability and Uncertainty ,050205 econometrics ,Mathematics - Abstract
Given a collection of data points, nonnegative matrix factorization (NMF) suggests expressing them as convex combinations of a small set of “archetypes” with nonnegative entries. This decomposition...
- Published
- 2019
- Full Text
- View/download PDF
14. Linearized two-layers neural networks in high dimension
- Author
-
Andrea Montanari, Behrooz Ghorbani, Theodor Misiakiewicz, and Song Mei
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Computer Science - Machine Learning ,Pure mathematics ,Polynomial ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,02 engineering and technology ,Function (mathematics) ,01 natural sciences ,Upper and lower bounds ,Regularization (mathematics) ,Square (algebra) ,Machine Learning (cs.LG) ,010104 statistics & probability ,Kernel method ,Dimension (vector space) ,FOS: Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0101 mathematics ,Statistics, Probability and Uncertainty ,Invariant (mathematics) ,Mathematics - Abstract
We consider the problem of learning an unknown function $f_{\star}$ on the $d$-dimensional sphere with respect to the square loss, given i.i.d. samples $\{(y_i,{\boldsymbol x}_i)\}_{i\le n}$ where ${\boldsymbol x}_i$ is a feature vector uniformly distributed on the sphere and $y_i=f_{\star}({\boldsymbol x}_i)+\varepsilon_i$. We study two popular classes of models that can be regarded as linearizations of two-layers neural networks around a random initialization: the random features model of Rahimi-Recht (RF); the neural tangent kernel model of Jacot-Gabriel-Hongler (NT). Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$. We consider two specific regimes: the approximation-limited regime, in which $n=\infty$ while $d$ and $N$ are large but finite; and the sample size-limited regime in which $N=\infty$ while $d$ and $n$ are large but finite. In the first regime we prove that if $d^{\ell + \delta} \le N\le d^{\ell+1-\delta}$ for small $\delta > 0$, then \RF\, effectively fits a degree-$\ell$ polynomial in the raw features, and \NT\, fits a degree-$(\ell+1)$ polynomial. In the second regime, both RF and NT reduce to kernel methods with rotationally invariant kernels. We prove that, if the number of samples is $d^{\ell + \delta} \le n \le d^{\ell +1-\delta}$, then kernel methods can fit at most a a degree-$\ell$ polynomial in the raw features. This lower bound is achieved by kernel ridge regression. Optimal prediction error is achieved for vanishing ridge regularization., Comment: 65 pages; 17 pdf figures
- Published
- 2021
- Full Text
- View/download PDF
15. Estimation of low-rank matrices via approximate message passing
- Author
-
Andrea Montanari and Ramji Venkataramanan
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Polynomial ,Rank (linear algebra) ,Machine Learning (stat.ML) ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,02 engineering and technology ,01 natural sciences ,010104 statistics & probability ,Matrix (mathematics) ,symbols.namesake ,Statistics - Machine Learning ,FOS: Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,approximate message passing ,0101 mathematics ,Eigenvalues and eigenvectors ,Mathematics ,Low-rank matrix estimation ,62E20 ,spectral initialization ,Estimator ,020206 networking & telecommunications ,Gaussian noise ,Outlier ,symbols ,Statistics, Probability and Uncertainty ,62F15 ,Algorithm ,Random matrix ,62H99 - Abstract
Consider the problem of estimating a low-rank matrix when its entries are perturbed by Gaussian noise. If the empirical distribution of the entries of the spikes is known, optimal estimators that exploit this knowledge can substantially outperform simple spectral approaches. Recent work characterizes the asymptotic accuracy of Bayes-optimal estimators in the high-dimensional limit. In this paper we present a practical algorithm that can achieve Bayes-optimal accuracy above the spectral threshold. A bold conjecture from statistical physics posits that no polynomial-time algorithm achieves optimal error below the same threshold (unless the best estimator is trivial). Our approach uses Approximate Message Passing (AMP) in conjunction with a spectral initialization. AMP algorithms have proved successful in a variety of statistical estimation tasks, and are amenable to exact asymptotic analysis via state evolution. Unfortunately, state evolution is uninformative when the algorithm is initialized near an unstable fixed point, as often happens in low-rank matrix estimation. We develop a new analysis of AMP that allows for spectral initializations. Our main theorem is general and applies beyond matrix estimation. However, we use it to derive detailed predictions for the problem of estimating a rank-one matrix in noise. Special cases of this problem are closely related---via universality arguments---to the network community detection problem for two asymmetric communities. For general rank-one models, we show that AMP can be used to construct confidence intervals and control false discovery rate. We provide illustrations of the general methodology by considering the cases of sparse low-rank matrices and of block-constant low-rank matrices with symmetric blocks (we refer to the latter as to the `Gaussian Block Model')., 76 pages, 6 pdf figures; Version 4 expands the introductory material and the applications to statistical inference
- Published
- 2021
16. An Information-Theoretic View of Stochastic Localization
- Author
-
Ahmed El Alaoui and Andrea Montanari
- Subjects
FOS: Computer and information sciences ,Computer Science - Information Theory ,Information Theory (cs.IT) ,Probability (math.PR) ,FOS: Mathematics ,Library and Information Sciences ,Mathematics - Probability ,Computer Science Applications ,Information Systems - Abstract
Given a probability measure $\mu$ over ${\mathbb R}^n$, it is often useful to approximate it by the convex combination of a small number of probability measures, such that each component is close to a product measure. Recently, Ronen Eldan used a stochastic localization argument to prove a general decomposition result of this type. In Eldan's theorem, the `number of components' is characterized by the entropy of the mixture, and `closeness to product' is characterized by the covariance matrix of each component. We present an elementary proof of Eldan's theorem which makes use of an information theory (or estimation theory) interpretation. The proof is analogous to the one of an earlier decomposition result known as the `pinning lemma.', Comment: 8 pages; v2 corrects an annoying typo in the statement of the main theorem
- Published
- 2021
- Full Text
- View/download PDF
17. Vascular proximity increases the risk of local recurrence in soft-tissue sarcomas of the thigh—a retrospective MRI study
- Author
-
Andrea Sambri, Emilia Caldari, Andrea Montanari, Michele Fiore, Luca Cevolani, Federico Ponti, Valerio D’Agostino, Giuseppe Bianchi, Marco Miceli, Paolo Spinnato, Massimiliano De Paolis, Davide Maria Donati, Sambri A., Caldari E., Montanari A., Fiore M., Cevolani L., Ponti F., D'agostino V., Bianchi G., Miceli M., Spinnato P., De Paolis M., and Donati D.M.
- Subjects
Cancer Research ,Soft tissue sarcoma ,Oncology ,by-pass ,Recurrence ,Vascular proximity ,Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,soft tissue sarcoma ,recurrence ,vascular proximity ,MRI ,By-pa ,RC254-282 ,Article - Abstract
Simple Summary Proximity to major vessels increases risk of local recurrence in soft tissue sarcomas of the thigh. When major vessels were observed to be surrounded by the tumor on preoperative MRI, vascular resection and by-pass reconstruction offered a better local control. Abstract The aim of this study was to establish the prognostic effects of the proximity of the tumor to the main vessels in patients affected by soft tissue sarcomas (STS) of the thigh. A total of 529 adult patients with deeply seated STS of the thigh and popliteal fossa were included. Vascular proximity was defined on MRI: type 1 > 5 mm; type 2 ≤ 5 mm and >0 mm; type 3 close to the tumor; type 4 enclosed by the tumor. Proximity to major vessels type 1–2 had a local recurrence (LR) rate lower than type 3–4 (p < 0.001). In type 4, vascular by-pass reduced LR risk. On multivariate analysis infiltrative histotypes, high FNLCC grade, radiotherapy administration, and type 3–4 of proximity to major vessels were found to be independent prognostic factors for LR. We observed an augmented risk of recurrence, but not of survival as the tumor was near to the major vessels. When major vessels were found to be surrounded by the tumor on preoperative MRI, vascular resection and bypass reconstruction offered a better local control.
- Published
- 2021
18. Deep learning: a statistical viewpoint
- Author
-
Alexander Rakhlin, Andrea Montanari, and Peter L. Bartlett
- Subjects
FOS: Computer and information sciences ,Numerical Analysis ,Computer Science - Machine Learning ,Optimization problem ,Artificial neural network ,Computer science ,business.industry ,General Mathematics ,Deep learning ,Linear model ,Mathematics - Statistics Theory ,Machine Learning (stat.ML) ,Statistics Theory (math.ST) ,Overfitting ,Machine learning ,computer.software_genre ,Regularization (mathematics) ,Machine Learning (cs.LG) ,Statistics - Machine Learning ,Norm (mathematics) ,Statistical learning theory ,FOS: Mathematics ,Artificial intelligence ,business ,computer - Abstract
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting, that is, accurate predictions despite overfitting training data. In this article, we survey recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behaviour of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favourable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
- Published
- 2021
- Full Text
- View/download PDF
19. Discussion of: 'Nonparametric regression using deep neural networks with ReLU activation function'
- Author
-
Song Mei, Behrooz Ghorbani, Andrea Montanari, and Theodor Misiakiewicz
- Subjects
Statistics and Probability ,business.industry ,Activation function ,Deep neural networks ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,Mathematics ,Nonparametric regression - Published
- 2020
- Full Text
- View/download PDF
20. The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training
- Author
-
Andrea Montanari and Yiqiao Zhong
- Subjects
Statistics and Probability ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,I.2.6 ,FOS: Mathematics ,Machine Learning (stat.ML) ,Mathematics - Statistics Theory ,62J07, 62H12 ,Statistics Theory (math.ST) ,Statistics, Probability and Uncertainty ,Machine Learning (cs.LG) - Abstract
Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not lead to a large generalization error. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here we study these phenomena in the context of two-layers neural networks in the neural tangent (NT) regime. We consider a simple data model, with isotropic covariates vectors in $d$ dimensions, and $N$ hidden neurons. We assume that both the sample size $n$ and the dimension $d$ are large, and they are polynomially related. Our first main result is a characterization of the eigenstructure of the empirical NT kernel in the overparametrized regime $Nd\gg n$. This characterization implies as a corollary that the minimum eigenvalue of the empirical NT kernel is bounded away from zero as soon as $Nd\gg n$, and therefore the network can exactly interpolate arbitrary labels in the same regime. Our second main result is a characterization of the generalization error of NT ridge regression including, as a special case, min-$\ell_2$ norm interpolation. We prove that, as soon as $Nd\gg n$, the test error is well approximated by the one of kernel ridge regression with respect to the infinite-width kernel. The latter is in turn well approximated by the error of polynomial ridge regression, whereby the regularization parameter is increased by a `self-induced' term related to the high-degree components of the activation function. The polynomial degree depends on the sample size and the dimension (in particular on $\log n/\log d$)., 83 pages, 5 figures
- Published
- 2020
21. Fundamental Limits of Weak Recovery with Applications to Phase Retrieval
- Author
-
Marco Mondelli and Andrea Montanari
- Subjects
FOS: Computer and information sciences ,Computer Science - Information Theory ,Information Theory (cs.IT) ,Applied Mathematics ,Gaussian ,Estimator ,Machine Learning (stat.ML) ,010103 numerical & computational mathematics ,Free probability ,01 natural sciences ,Upper and lower bounds ,Combinatorics ,Computational Mathematics ,symbols.namesake ,Computational Theory and Mathematics ,Statistics - Machine Learning ,symbols ,0101 mathematics ,Phase retrieval ,Spectral method ,Random matrix ,Analysis ,Eigenvalues and eigenvectors ,Mathematics - Abstract
In phase retrieval we want to recover an unknown signal $\boldsymbol x\in\mathbb C^d$ from $n$ quadratic measurements of the form $y_i = |\langle{\boldsymbol a}_i,{\boldsymbol x}\rangle|^2+w_i$ where $\boldsymbol a_i\in \mathbb C^d$ are known sensing vectors and $w_i$ is measurement noise. We ask the following weak recovery question: what is the minimum number of measurements $n$ needed to produce an estimator $\hat{\boldsymbol x}(\boldsymbol y)$ that is positively correlated with the signal $\boldsymbol x$? We consider the case of Gaussian vectors $\boldsymbol a_i$. We prove that - in the high-dimensional limit - a sharp phase transition takes place, and we locate the threshold in the regime of vanishingly small noise. For $n\le d-o(d)$ no estimator can do significantly better than random and achieve a strictly positive correlation. For $n\ge d+o(d)$ a simple spectral estimator achieves a positive correlation. Surprisingly, numerical simulations with the same spectral estimator demonstrate promising performance with realistic sensing matrices. Spectral methods are used to initialize non-convex optimization algorithms in phase retrieval, and our approach can boost the performance in this setting as well. Our impossibility result is based on classical information-theory arguments. The spectral algorithm computes the leading eigenvector of a weighted empirical covariance matrix. We obtain a sharp characterization of the spectral properties of this random matrix using tools from free probability and generalizing a recent result by Lu and Li. Both the upper and lower bound generalize beyond phase retrieval to measurements $y_i$ produced according to a generalized linear model. As a byproduct of our analysis, we compare the threshold of the proposed spectral method with that of a message passing algorithm., Comment: 63 pages, 3 figures, presented at COLT'18 and accepted at Foundations of Computational Mathematics
- Published
- 2018
- Full Text
- View/download PDF
22. Spectral Algorithms for Tensor Completion
- Author
-
Andrea Montanari and Nike Sun
- Subjects
FOS: Computer and information sciences ,Rank (linear algebra) ,General Mathematics ,Mathematics - Statistics Theory ,Machine Learning (stat.ML) ,Scale (descriptive set theory) ,Statistics Theory (math.ST) ,010103 numerical & computational mathematics ,02 engineering and technology ,01 natural sciences ,Statistics - Machine Learning ,Rank condition ,Computer Science - Data Structures and Algorithms ,FOS: Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,Data Structures and Algorithms (cs.DS) ,Tensor ,0101 mathematics ,Mathematics ,Complement (set theory) ,Semidefinite programming ,Applied Mathematics ,Order (ring theory) ,020206 networking & telecommunications ,Spectral method ,Algorithm - Abstract
In the tensor completion problem, one seeks to estimate a low-rank tensor based on a random sample of revealed entries. In terms of the required sample size, earlier work revealed a large gap between estimation with unbounded computational resources (using, for instance, tensor nuclear norm minimization) and polynomial-time algorithms. Among the latter, the best statistical guarantees have been proved, for third-order tensors, using the sixth level of the sum-of-squares (SOS) semidefinite programming hierarchy (Barak and Moitra, 2014). However, the SOS approach does not scale well to large problem instances. By contrast, spectral methods --- based on unfolding or matricizing the tensor --- are attractive for their low complexity, but have been believed to require a much larger sample size. This paper presents two main contributions. First, we propose a new unfolding-based method, which outperforms naive ones for symmetric $k$-th order tensors of rank $r$. For this result we make a study of singular space estimation for partially revealed matrices of large aspect ratio, which may be of independent interest. For third-order tensors, our algorithm matches the SOS method in terms of sample size (requiring about $rd^{3/2}$ revealed entries), subject to a worse rank condition ($r\ll d^{3/4}$ rather than $r\ll d^{3/2}$). We complement this result with a different spectral algorithm for third-order tensors in the overcomplete ($r\ge d$) regime. Under a random model, this second approach succeeds in estimating tensors of rank $d\le r \ll d^{3/2}$ from about $rd^{3/2}$ revealed entries.
- Published
- 2018
- Full Text
- View/download PDF
23. Surprises in High-Dimensional Ridgeless Least Squares Interpolation
- Author
-
Trevor Hastie, Andrea Montanari, Saharon Rosset, and Ryan J. Tibshirani
- Subjects
Statistics and Probability ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics::Theory ,Statistics - Machine Learning ,FOS: Mathematics ,Mathematics - Statistics Theory ,Machine Learning (stat.ML) ,Statistics Theory (math.ST) ,Statistics, Probability and Uncertainty ,Article ,Machine Learning (cs.LG) - Abstract
Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum $\ell_2$ norm ("ridgeless") interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors $x_i \in {\mathbb R}^p$ are obtained by applying a linear transform to a vector of i.i.d. entries, $x_i = \Sigma^{1/2} z_i$ (with $z_i \in {\mathbb R}^p$); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, $x_i = \varphi(W z_i)$ (with $z_i \in {\mathbb R}^d$, $W \in {\mathbb R}^{p \times d}$ a matrix of i.i.d. entries, and $\varphi$ an activation function acting componentwise on $W z_i$). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization., Comment: 68 pages; 16 figures. This revision contains non-asymptotic version of earlier results, and results for general coefficients
- Published
- 2019
- Full Text
- View/download PDF
24. The threshold for SDP-refutation of random regular NAE-3SAT
- Author
-
Yash Deshpande, Andrea Montanari, Ryan O'Donnell, Tselil Schramm, and Subhabrata Sen
- Subjects
010104 statistics & probability ,010102 general mathematics ,0101 mathematics ,01 natural sciences - Published
- 2019
- Full Text
- View/download PDF
25. On the computational tractability of statistical estimation on amenable graphs
- Author
-
Ahmed El Alaoui and Andrea Montanari
- Subjects
Statistics and Probability ,Vertex (graph theory) ,FOS: Computer and information sciences ,Computer Science - Information Theory ,Structure (category theory) ,Mathematics - Statistics Theory ,0102 computer and information sciences ,Statistics Theory (math.ST) ,01 natural sciences ,010104 statistics & probability ,Computer Science - Data Structures and Algorithms ,FOS: Mathematics ,Fraction (mathematics) ,Data Structures and Algorithms (cs.DS) ,0101 mathematics ,Special case ,Local algorithm ,Mathematics ,Discrete mathematics ,Random graph ,Group (mathematics) ,Information Theory (cs.IT) ,Probability (math.PR) ,16. Peace & justice ,010201 computation theory & mathematics ,Line (geometry) ,Statistics, Probability and Uncertainty ,Mathematics - Probability ,Analysis ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
We consider the problem of estimating a vector of discrete variables $(\theta_1,\cdots,\theta_n)$, based on noisy observations $Y_{uv}$ of the pairs $(\theta_u,\theta_v)$ on the edges of a graph $G=([n],E)$. This setting comprises a broad family of statistical estimation problems, including group synchronization on graphs, community detection, and low-rank matrix estimation. A large body of theoretical work has established sharp thresholds for weak and exact recovery, and sharp characterizations of the optimal reconstruction accuracy in such models, focusing however on the special case of Erd\"os--R\'enyi-type random graphs. The single most important finding of this line of work is the ubiquity of an information-computation gap. Namely, for many models of interest, a large gap is found between the optimal accuracy achievable by any statistical method, and the optimal accuracy achieved by known polynomial-time algorithms. Moreover, this gap is generally believed to be robust to small amounts of additional side information revealed about the $\theta_i$'s. How does the structure of the graph $G$ affect this picture? Is the information-computation gap a general phenomenon or does it only apply to specific families of graphs? We prove that the picture is dramatically different for graph sequences converging to amenable graphs (including, for instance, $d$-dimensional grids). We consider a model in which an arbitrarily small fraction of the vertex labels is revealed, and show that a linear-time local algorithm can achieve reconstruction accuracy that is arbitrarily close to the information-theoretic optimum. We contrast this to the case of random graphs. Indeed, focusing on group synchronization on random regular graphs, we prove that the information-computation gap still persists even when a small amount of side information is revealed., Comment: Stronger results, improved presentation. The transitivity assumption on the limiting graph is removed. Instead, we introduce and use the notion of a `tame' random rooted graph. 40 pages
- Published
- 2019
- Full Text
- View/download PDF
26. The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning
- Author
-
Léo Miolane, Andrea Montanari, Dynamics of Geometric Networks (DYOGENE), Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), Stanford University, Département d'informatique - ENS Paris (DI-ENS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
- Subjects
Statistics and Probability ,Pointwise ,Gaussian ,Estimator ,020206 networking & telecommunications ,Mathematics - Statistics Theory ,02 engineering and technology ,01 natural sciences ,Regularization (mathematics) ,Stability (probability) ,Empirical distribution function ,010104 statistics & probability ,symbols.namesake ,Lasso (statistics) ,Gaussian noise ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,Applied mathematics ,0101 mathematics ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
The Lasso is a popular regression method for high-dimensional problems in which the number of parameters $\theta_1,\dots,\theta_N$, is larger than the number $n$ of samples: $N>n$. A useful heuristics relates the statistical properties of the Lasso estimator to that of a simple soft-thresholding denoiser,in a denoising problem in which the parameters $(\theta_i)_{i\le N}$ are observed in Gaussian noise, with a carefully tuned variance. Earlier work confirmed this picture in the limit $n,N\to\infty$, pointwise in the parameters $\theta$, and in the value of the regularization parameter. Here, we consider a standard random design model and prove exponential concentration of its empirical distribution around the prediction provided by the Gaussian denoising model. Crucially, our results are uniform with respect to $\theta$ belonging to $\ell_q$ balls, $q\in [0,1]$, and with respect to the regularization parameter. This allows to derive sharp results for the performances of various data-driven procedures to tune the regularization. Our proofs make use of Gaussian comparison inequalities, and in particular of a version of Gordon's minimax theorem developed by Thrampoulidis, Oymak, and Hassibi, which controls the optimum value of the Lasso optimization problem. Crucially, we prove a stability property of the minimizer in Wasserstein distance, that allows to characterize properties of the minimizer itself., Comment: 68 pages, 2 figures
- Published
- 2018
27. The landscape of empirical risk for nonconvex losses
- Author
-
Andrea Montanari, Song Mei, and Yu Bai
- Subjects
Statistics and Probability ,Hessian matrix ,Uniform convergence ,Population ,02 engineering and technology ,uniform convergence ,01 natural sciences ,Robust regression ,010104 statistics & probability ,symbols.namesake ,0202 electrical engineering, electronic engineering, information engineering ,Applied mathematics ,62J02 ,Empirical risk minimization ,0101 mathematics ,education ,Empirical process ,Mathematics ,education.field_of_study ,empirical risk minimization ,020206 networking & telecommunications ,Function (mathematics) ,landscape ,Stationary point ,Nonconvex optimization ,symbols ,Statistics, Probability and Uncertainty ,62F10 ,62H30 - Abstract
Most high-dimensional estimation methods propose to minimize a cost function (empirical risk) that is a sum of losses associated to each data point (each example). In this paper, we focus on the case of nonconvex losses. Classical empirical process theory implies uniform convergence of the empirical (or sample) risk to the population risk. While under additional assumptions, uniform convergence implies consistency of the resulting M-estimator, it does not ensure that the latter can be computed efficiently. ¶ In order to capture the complexity of computing M-estimators, we study the landscape of the empirical risk, namely its stationary points and their properties. We establish uniform convergence of the gradient and Hessian of the empirical risk to their population counterparts, as soon as the number of samples becomes larger than the number of unknown parameters (modulo logarithmic factors). Consequently, good properties of the population risk can be carried to the empirical risk, and we are able to establish one-to-one correspondence of their stationary points. We demonstrate that in several problems such as nonconvex binary classification, robust regression and Gaussian mixture model, this result implies a complete characterization of the landscape of the empirical risk, and of the convergence properties of descent algorithms. ¶ We extend our analysis to the very high-dimensional setting in which the number of parameters exceeds the number of samples, and provides a characterization of the empirical risk landscape under a nearly information-theoretically minimal condition. Namely, if the number of samples exceeds the sparsity of the parameters vector (modulo logarithmic factors), then a suitable uniform convergence result holds. We apply this result to nonconvex binary classification and robust regression in very high-dimension.
- Published
- 2018
- Full Text
- View/download PDF
28. Debiasing the lasso: Optimal sample size for Gaussian designs
- Author
-
Adel Javanmard and Andrea Montanari
- Subjects
Statistics and Probability ,Gaussian ,Population ,Inverse ,02 engineering and technology ,01 natural sciences ,Combinatorics ,010104 statistics & probability ,symbols.namesake ,62J07 ,Lasso (statistics) ,62J05 ,hypothesis testing ,0202 electrical engineering, electronic engineering, information engineering ,0101 mathematics ,education ,confidence intervals ,Mathematics ,education.field_of_study ,Estimator ,020206 networking & telecommunications ,Covariance ,Minimax ,high-dimensional regression ,sample size ,Distribution (mathematics) ,symbols ,Statistics, Probability and Uncertainty ,Lasso ,62F12 ,bias and variance - Abstract
Performing statistical inference in high-dimensional models is challenging because of the lack of precise information on the distribution of high-dimensional regularized estimators. ¶ Here, we consider linear regression in the high-dimensional regime $p>>n$ and the Lasso estimator: we would like to perform inference on the parameter vector $\theta^{*}\in\mathbb{R}^{p}$. Important progress has been achieved in computing confidence intervals and $p$-values for single coordinates $\theta^{*}_{i}$, $i\in\{1,\dots,p\}$. A key role in these new inferential methods is played by a certain debiased estimator $\widehat{\theta}^{\mathrm{d}}$. Earlier work establishes that, under suitable assumptions on the design matrix, the coordinates of $\widehat{\theta}^{\mathrm{d}}$ are asymptotically Gaussian provided the true parameters vector $\theta^{*}$ is $s_{0}$-sparse with $s_{0}=o(\sqrt{n}/\log p)$. ¶ The condition $s_{0}=o(\sqrt{n}/\log p)$ is considerably stronger than the one for consistent estimation, namely $s_{0}=o(n/\log p)$. In this paper, we consider Gaussian designs with known or unknown population covariance. When the covariance is known, we prove that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_{0}=o(n/(\log p)^{2})$. ¶ The same conclusion holds if the population covariance is unknown but can be estimated sufficiently well. For intermediate regimes, we describe the trade-off between sparsity in the coefficients $\theta^{*}$, and sparsity in the inverse covariance of the design. We further discuss several applications of our results beyond high-dimensional inference. In particular, we propose a thresholded Lasso estimator that is minimax optimal up to a factor $1+o_{n}(1)$ for i.i.d. Gaussian designs.
- Published
- 2018
29. Group synchronization on grids
- Author
-
Emmanuel Abbe, Nikhil Srivastava, Laurent Massoulié, Andrea Montanari, Allan Sly, Ecole Polytechnique Fédérale de Lausanne (EPFL), Microsoft Research - Inria Joint Centre (MSR - INRIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Microsoft Research Laboratory Cambridge-Microsoft Corporation [Redmond, Wash.], Dynamics of Geometric Networks (DYOGENE), Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), Stanford University, Princeton University, Lawrence Berkeley National Laboratory [Berkeley] (LBNL), Département d'informatique - ENS Paris (DI-ENS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
- Subjects
FOS: Computer and information sciences ,Computer Science - Information Theory ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,0102 computer and information sciences ,01 natural sciences ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,Synchronization (computer science) ,Weak recovery ,FOS: Mathematics ,Structure from motion ,0101 mathematics ,Mathematics ,Discrete mathematics ,Group synchronization ,Community detection ,Group (mathematics) ,Information Theory (cs.IT) ,010102 general mathematics ,Grid ,[MATH.MATH-PR]Mathematics [math]/Probability [math.PR] ,Noise ,Compact group ,010201 computation theory & mathematics ,Variety (universal algebra) ,Focus (optics) ,Graphs - Abstract
Group synchronization requires to estimate unknown elements $({\theta}_v)_{v\in V}$ of a compact group ${\mathfrak G}$ associated to the vertices of a graph $G=(V,E)$, using noisy observations of the group differences associated to the edges. This model is relevant to a variety of applications ranging from structure from motion in computer vision to graph localization and positioning, to certain families of community detection problems. We focus on the case in which the graph $G$ is the $d$-dimensional grid. Since the unknowns ${\boldsymbol \theta}_v$ are only determined up to a global action of the group, we consider the following weak recovery question. Can we determine the group difference ${\theta}_u^{-1}{\theta}_v$ between far apart vertices $u, v$ better than by random guessing? We prove that weak recovery is possible (provided the noise is small enough) for $d\ge 3$ and, for certain finite groups, for $d\ge 2$. Viceversa, for some continuous groups, we prove that weak recovery is impossible for $d=2$. Finally, for strong enough noise, weak recovery is always impossible., Comment: 21 pages
- Published
- 2018
30. Online rules for control of false discovery rate and false discovery exceedance
- Author
-
Andrea Montanari and Adel Javanmard
- Subjects
FOS: Computer and information sciences ,0301 basic medicine ,Statistics and Probability ,False discovery rate ,Class (set theory) ,Mathematics - Statistics Theory ,Machine Learning (stat.ML) ,Statistics Theory (math.ST) ,Scientific field ,Statistics - Applications ,01 natural sciences ,Machine Learning (cs.LG) ,online decision making ,Methodology (stat.ME) ,Combinatorics ,010104 statistics & probability ,03 medical and health sciences ,Statistics - Machine Learning ,false discovery rate (FDR) ,FOS: Mathematics ,Statistical inference ,False positive paradox ,62F05 ,Applications (stat.AP) ,62F03 ,0101 mathematics ,Control (linguistics) ,Statistics - Methodology ,Statistical hypothesis testing ,Mathematics ,false discovery exceedance (FDX) ,Computer Science - Learning ,Hypothesis testing ,030104 developmental biology ,Multiple comparisons problem ,62L99 ,Statistics, Probability and Uncertainty - Abstract
Multiple hypothesis testing is a core problem in statistical inference and arises in almost every scientific field. Given a set of null hypotheses $\mathcal{H}(n) = (H_1,\dotsc, H_n)$, Benjamini and Hochberg introduced the false discovery rate (FDR), which is the expected proportion of false positives among rejected null hypotheses, and proposed a testing procedure that controls FDR below a pre-assigned significance level. Nowadays FDR is the criterion of choice for large scale multiple hypothesis testing. In this paper we consider the problem of controlling FDR in an "online manner". Concretely, we consider an ordered --possibly infinite-- sequence of null hypotheses $\mathcal{H} = (H_1,H_2,H_3,\dots )$ where, at each step $i$, the statistician must decide whether to reject hypothesis $H_i$ having access only to the previous decisions. This model was introduced by Foster and Stine. We study a class of "generalized alpha-investing" procedures and prove that any rule in this class controls online FDR, provided $p$-values corresponding to true nulls are independent from the other $p$-values. (Earlier work only established mFDR control.) Next, we obtain conditions under which generalized alpha-investing controls FDR in the presence of general $p$-values dependencies. Finally, we develop a modified set of procedures that also allow to control the false discovery exceedance (the tail of the proportion of false discoveries). Numerical simulations and analytical results indicate that online procedures do not incur a large loss in statistical power with respect to offline approaches, such as Benjamini-Hochberg., Comment: 44 pages, 9 figures, to appear in Annals of Statistics
- Published
- 2018
- Full Text
- View/download PDF
31. A Mean Field View of the Landscape of Two-Layers Neural Networks
- Author
-
Song Mei, Phan-Minh Nguyen, and Andrea Montanari
- Subjects
Computer Science::Machine Learning ,FOS: Computer and information sciences ,Mathematical optimization ,Computer Science - Machine Learning ,Generalization ,Computer science ,FOS: Physical sciences ,Mathematics - Statistics Theory ,Machine Learning (stat.ML) ,02 engineering and technology ,Statistics Theory (math.ST) ,01 natural sciences ,Machine Learning (cs.LG) ,Statistics::Machine Learning ,gradient flow ,Local optimum ,Simple (abstract algebra) ,Statistics - Machine Learning ,stochastic gradient descent ,0103 physical sciences ,Convergence (routing) ,0202 electrical engineering, electronic engineering, information engineering ,partial differential equations ,FOS: Mathematics ,010306 general physics ,Condensed Matter - Statistical Mechanics ,Multidisciplinary ,Partial differential equation ,Artificial neural network ,Statistical Mechanics (cond-mat.stat-mech) ,Statistics ,neural networks ,Maxima and minima ,Wasserstein space ,Stochastic gradient descent ,PNAS Plus ,Physical Sciences ,020201 artificial intelligence & image processing - Abstract
Multi-layer neural networks are among the most powerful models in machine learning, yet the fundamental reasons for this success defy mathematical understanding. Learning a neural network requires to optimize a non-convex high-dimensional objective (risk function), a problem which is usually attacked using stochastic gradient descent (SGD). Does SGD converge to a global optimum of the risk or only to a local optimum? In the first case, does this happen because local minima are absent, or because SGD somehow avoids them? In the second, why do local minima reached by SGD have good generalization properties? In this paper we consider a simple case, namely two-layers neural networks, and prove that -in a suitable scaling limit- SGD dynamics is captured by a certain non-linear partial differential equation (PDE) that we call distributional dynamics (DD). We then consider several specific examples, and show how DD can be used to prove convergence of SGD to networks with nearly ideal generalization error. This description allows to 'average-out' some of the complexities of the landscape of neural networks, and can be used to prove a general convergence result for noisy SGD., Comment: 103 pages
- Published
- 2018
- Full Text
- View/download PDF
32. Optimization of the Sherrington-Kirkpatrick Hamiltonian
- Author
-
Andrea Montanari
- Subjects
Independent and identically distributed random variables ,Optimization problem ,Spin glass ,General Computer Science ,General Mathematics ,Gaussian ,Diagonal ,FOS: Physical sciences ,01 natural sciences ,Condensed Matter::Disordered Systems and Neural Networks ,010104 statistics & probability ,Matrix (mathematics) ,symbols.namesake ,Variational principle ,0103 physical sciences ,FOS: Mathematics ,Applied mathematics ,0101 mathematics ,010306 general physics ,Mathematics - Optimization and Control ,Time complexity ,Condensed Matter - Statistical Mechanics ,Mathematics ,Mathematical physics ,Physics ,Statistical Mechanics (cond-mat.stat-mech) ,010102 general mathematics ,Probability (math.PR) ,Approximation algorithm ,Physics::Classical Physics ,Quadratic form ,Optimization and Control (math.OC) ,Physics::Space Physics ,symbols ,Random matrix ,Mathematics - Probability ,Hamiltonian (control theory) - Abstract
Let ${\boldsymbol A}\in{\mathbb R}^{n\times n}$ be a symmetric random matrix with independent and identically distributed Gaussian entries above the diagonal. We consider the problem of maximizing $\langle{\boldsymbol \sigma},{\boldsymbol A}{\boldsymbol \sigma}\rangle$ over binary vectors ${\boldsymbol \sigma}\in\{+1,-1\}^n$. In the language of statistical physics, this amounts to finding the ground state of the Sherrington-Kirkpatrick model of spin glasses. The asymptotic value of this optimization problem was characterized by Parisi via a celebrated variational principle, subsequently proved by Talagrand. We give an algorithm that, for any $\varepsilon>0$, outputs ${\boldsymbol \sigma}_*\in\{-1,+1\}^n$ such that $\langle{\boldsymbol \sigma}_*,{\boldsymbol A}{\boldsymbol \sigma}_*\rangle$ is at least $(1-\varepsilon)$ of the optimum value, with probability converging to one as $n\to\infty$. The algorithm's time complexity is $C(\varepsilon)\, n^2$. It is a message-passing algorithm, but the specific structure of its update rules is new. As a side result, we prove that, at (low) non-zero temperature, the algorithm constructs approximate solutions of the Thouless-Anderson-Palmer equations., Comment: 27 pages
- Published
- 2018
- Full Text
- View/download PDF
33. TAP free energy, spin glasses, and variational inference
- Author
-
Song Mei, Zhou Fan, and Andrea Montanari
- Subjects
Statistics and Probability ,Spin glass ,Bayesian inference ,FOS: Physical sciences ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,Expected value ,01 natural sciences ,Condensed Matter::Disordered Systems and Neural Networks ,010104 statistics & probability ,symbols.namesake ,FOS: Mathematics ,Statistical physics ,Limit (mathematics) ,0101 mathematics ,Gibbs measure ,Mathematical Physics ,Mathematics ,010102 general mathematics ,Probability (math.PR) ,Mathematical Physics (math-ph) ,Free probability ,Kac–Rice formula ,free probability ,TAP complexity ,Mean field theory ,Sherrington–Kirkpatrick model ,symbols ,Statistics, Probability and Uncertainty ,Constant (mathematics) ,Mathematics - Probability ,Energy (signal processing) ,60F10 - Abstract
We consider the Sherrington–Kirkpatrick model of spin glasses with ferromagnetically biased couplings. For a specific choice of the couplings mean, the resulting Gibbs measure is equivalent to the Bayesian posterior for a high-dimensional estimation problem known as “${\mathbb{Z}}_{2}$ synchronization.” Statistical physics suggests to compute the expectation with respect to this Gibbs measure (the posterior mean in the synchronization problem), by minimizing the so-called Thouless–Anderson–Palmer (TAP) free energy, instead of the mean field (MF) free energy. We prove that this identification is correct, provided the ferromagnetic bias is larger than a constant (i.e., the noise level is small enough in synchronization). Namely, we prove that the scaled $\ell _{2}$ distance between any low energy local minimizers of the TAP free energy and the mean of the Gibbs measure vanishes in the large size limit. Our proof technique is based on upper bounding the expected number of critical points of the TAP free energy using the Kac–Rice formula.
- Published
- 2018
- Full Text
- View/download PDF
34. Effective compression maps for torus-based cryptography
- Author
-
Andrea Montanari
- Subjects
Discrete mathematics ,Applied Mathematics ,020206 networking & telecommunications ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Computer Science Applications ,Torus-based cryptography ,Combinatorics ,Pairing-based cryptography ,Finite field ,010201 computation theory & mathematics ,0202 electrical engineering, electronic engineering, information engineering ,XTR ,Quadratic field ,Algebraic number ,Prime power ,Mathematics ,Computational number theory - Abstract
We give explicit parametrizations of the algebraic tori $$\mathbb {T}_{n}$$Tn over any finite field $$\mathbb {F}_{q}$$Fq for any prime power $$n$$n. Applying the construction for $$n=3$$n=3 to a quadratic field $$\mathbb {F}_{q^2}$$Fq2 we show that the set of $$\mathbb {F}_q$$Fq-rational points of the torus $$\mathbb {T}_{6}$$T6 is birationally equivalent to the affine part of a Singer arc in $$\mathbb {P}^2(\mathbb {F}_{q^2})$$P2(Fq2). This gives a simple, yet efficient compression and decompression algorithm from $$\mathbb {T}_{6}(\mathbb {F}_{q})$$T6(Fq) to $$\mathbb {A}^2(\mathbb {F}_{q})$$A2(Fq) that can be substituted in the faster implementation of CEILIDH (Granger et al., in Algorithmic number theory, pp 235---249, Springer, Berlin, 2004) achieving a theoretical 30 % speedup and that is also cheaper than the recently proposed factor-$$6$$6 compression technique in Karabina (IEEE Trans Inf Theory 58(5):3293---3304, 2012). The compression methods here presented have a wide class of applications to public-key and pairing-based cryptography over any finite field.
- Published
- 2015
- Full Text
- View/download PDF
35. [Untitled]
- Author
-
Andrea Montanari and Emmanuel Abbe
- Subjects
Discrete mathematics ,Random graph ,Conditional entropy ,Computational Theory and Mathematics ,Stochastic block model ,Entropy (information theory) ,Graphical model ,Constraint satisfaction ,Cluster analysis ,Constraint satisfaction problem ,Theoretical Computer Science ,Mathematics - Abstract
This paper studies a class of probabilistic models on graphs, where edge variables depend on incident node variables through a fixed probability kernel. The class includes planted constraint satisfaction problems (CSPs), as well as more general structures motivated by coding and community clustering problems. It is shown that under mild assumptions on the kernel and for sparse random graphs, the conditional entropy of the node variables given the edge variables concentrates around a deterministic threshold. This implies in particular the concentration of the number of solutions in a broad class of planted CSPs, the existence of a threshold function for the disassortative stochastic block model, and the proof of a conjecture on parity check codes. It also establishes new connections among coding, clustering and satisfiability.
- Published
- 2015
- Full Text
- View/download PDF
36. State Evolution for Approximate Message Passing with Non-Separable Functions
- Author
-
Raphaël Berthier, Phan-Minh Nguyen, and Andrea Montanari
- Subjects
Statistics and Probability ,FOS: Computer and information sciences ,Gaussian ,Computer Science - Information Theory ,02 engineering and technology ,01 natural sciences ,Separable space ,Combinatorics ,010104 statistics & probability ,symbols.namesake ,Matrix (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Limit (mathematics) ,0101 mathematics ,Mathematics ,Numerical Analysis ,Applied Mathematics ,Information Theory (cs.IT) ,020206 networking & telecommunications ,Lipschitz continuity ,Compressed sensing ,Computational Theory and Mathematics ,symbols ,Phase retrieval ,Random matrix ,Analysis - Abstract
Given a high-dimensional data matrix ${\boldsymbol A}\in{\mathbb R}^{m\times n}$, Approximate Message Passing (AMP) algorithms construct sequences of vectors ${\boldsymbol u}^t\in{\mathbb R}^n$, ${\boldsymbol v}^t\in{\mathbb R}^m$, indexed by $t\in\{0,1,2\dots\}$ by iteratively applying ${\boldsymbol A}$ or ${\boldsymbol A}^{{\sf T}}$, and suitable non-linear functions, which depend on the specific application. Special instances of this approach have been developed --among other applications-- for compressed sensing reconstruction, robust regression, Bayesian estimation, low-rank matrix recovery, phase retrieval, and community detection in graphs. For certain classes of random matrices ${\boldsymbol A}$, AMP admits an asymptotically exact description in the high-dimensional limit $m,n\to\infty$, which goes under the name of `state evolution.' Earlier work established state evolution for separable non-linearities (under certain regularity conditions). Nevertheless, empirical work demonstrated several important applications that require non-separable functions. In this paper we generalize state evolution to Lipschitz continuous non-separable nonlinearities, for Gaussian matrices ${\boldsymbol A}$. Our proof makes use of Bolthausen's conditioning technique along with several approximation arguments. In particular, we introduce a modified algorithm (called LAMP for Long AMP) which is of independent interest., Comment: 41 pages, 4 figures
- Published
- 2017
37. How well do local algorithms solve semidefinite programs?
- Author
-
Zhou Fan and Andrea Montanari
- Subjects
FOS: Computer and information sciences ,Random graph ,Discrete mathematics ,Discrete Mathematics (cs.DM) ,010102 general mathematics ,Probabilistic logic ,Machine Learning (stat.ML) ,0102 computer and information sciences ,Harmonic measure ,01 natural sciences ,Upper and lower bounds ,Combinatorics ,Optimization and Control (math.OC) ,Statistics - Machine Learning ,010201 computation theory & mathematics ,Bounded function ,FOS: Mathematics ,Side information ,0101 mathematics ,Mathematics - Optimization and Control ,Algorithm ,Local algorithm ,Graph bisection ,Computer Science - Discrete Mathematics ,Mathematics - Abstract
Several probabilistic models from high-dimensional statistics and machine learning reveal an intriguing --and yet poorly understood-- dichotomy. Either simple local algorithms succeed in estimating the object of interest, or even sophisticated semi-definite programming (SDP) relaxations fail. In order to explore this phenomenon, we study a classical SDP relaxation of the minimum graph bisection problem, when applied to Erd\H{o}s-Renyi random graphs with bounded average degree $d>1$, and obtain several types of results. First, we use a dual witness construction (using the so-called non-backtracking matrix of the graph) to upper bound the SDP value. Second, we prove that a simple local algorithm approximately solves the SDP to within a factor $2d^2/(2d^2+d-1)$ of the upper bound. In particular, the local algorithm is at most $8/9$ suboptimal, and $1+O(1/d)$ suboptimal for large degree. We then analyze a more sophisticated local algorithm, which aggregates information according to the harmonic measure on the limiting Galton-Watson (GW) tree. The resulting lower bound is expressed in terms of the conductance of the GW tree and matches surprisingly well the empirically determined SDP values on large-scale Erd\H{o}s-Renyi graphs. We finally consider the planted partition model. In this case, purely local algorithms are known to fail, but they do succeed if a small amount of side information is available. Our results imply quantitative bounds on the threshold for partial recovery using SDP in this model., Comment: 48 pages, 1 pdf figure
- Published
- 2017
- Full Text
- View/download PDF
38. Finding Hidden Cliques of Size $$\sqrt{N/e}$$ N / e in Nearly Linear Time
- Author
-
Yash Deshpande and Andrea Montanari
- Subjects
Average-case complexity ,Random graph ,Degree (graph theory) ,Applied Mathematics ,Complete graph ,Clique (graph theory) ,Combinatorics ,Computational Mathematics ,Computational Theory and Mathematics ,Clique problem ,Realization (systems) ,Time complexity ,Analysis ,Mathematics - Abstract
Consider an Erdos---Renyi random graph in which each edge is present independently with probability $$1/2$$1/2, except for a subset $$\mathsf{C}_N$$CN of the vertices that form a clique (a completely connected subgraph). We consider the problem of identifying the clique, given a realization of such a random graph. The algorithm of Dekel et al. (ANALCO. SIAM, pp 67---75, 2011) provably identifies the clique $$\mathsf{C}_N$$CN in linear time, provided $$|\mathsf{C}_N|\ge 1.261\sqrt{N}$$|CN|?1.261N. Spectral methods can be shown to fail on cliques smaller than $$\sqrt{N}$$N. In this paper we describe a nearly linear-time algorithm that succeeds with high probability for $$|\mathsf{C}_N|\ge (1+{\varepsilon })\sqrt{N/e}$$|CN|?(1+?)N/e for any $${\varepsilon }>0$$?>0. This is the first algorithm that provably improves over spectral methods. We further generalize the hidden clique problem to other background graphs (the standard case corresponding to the complete graph on $$N$$N vertices). For large-girth regular graphs of degree $$(\varDelta +1)$$(Δ+1) we prove that so-called local algorithms succeed if $$|\mathsf{C}_N|\ge (1+{\varepsilon })N/\sqrt{e\varDelta }$$|CN|?(1+?)N/eΔ and fail if $$|\mathsf{C}_N|\le (1-{\varepsilon })N/\sqrt{e\varDelta }$$|CN|≤(1-?)N/eΔ.
- Published
- 2014
- Full Text
- View/download PDF
39. Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing
- Author
-
David L. Donoho, Andrea Montanari, and Adel Javanmard
- Subjects
FOS: Computer and information sciences ,Theoretical computer science ,Computer Science - Information Theory ,FOS: Physical sciences ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,02 engineering and technology ,Library and Information Sciences ,01 natural sciences ,Dimension (vector space) ,Diagonal matrix ,FOS: Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,0101 mathematics ,Condensed Matter - Statistical Mechanics ,Mathematics ,Discrete mathematics ,Sequence ,Statistical Mechanics (cond-mat.stat-mech) ,Signal reconstruction ,Information Theory (cs.IT) ,010102 general mathematics ,Mathematical analysis ,020206 networking & telecommunications ,Coupling (probability) ,Empirical distribution function ,Computer Science Applications ,Compressed sensing ,Undersampling ,Probability distribution ,Algorithm ,Information Systems - Abstract
We study the compressed sensing reconstruction problem for a broad class of random, band-diagonal sensing matrices. This construction is inspired by the idea of spatial coupling in coding theory. As demonstrated heuristically and numerically by Krzakala et al. \cite{KrzakalaEtAl}, message passing algorithms can effectively solve the reconstruction problem for spatially coupled measurements with undersampling rates close to the fraction of non-zero coordinates. We use an approximate message passing (AMP) algorithm and analyze it through the state evolution method. We give a rigorous proof that this approach is successful as soon as the undersampling rate $\delta$ exceeds the (upper) R\'enyi information dimension of the signal, $\uRenyi(p_X)$. More precisely, for a sequence of signals of diverging dimension $n$ whose empirical distribution converges to $p_X$, reconstruction is with high probability successful from $\uRenyi(p_X)\, n+o(n)$ measurements taken according to a band diagonal matrix. For sparse signals, i.e., sequences of dimension $n$ and $k(n)$ non-zero entries, this implies reconstruction from $k(n)+o(n)$ measurements. For `discrete' signals, i.e., signals whose coordinates take a fixed finite set of values, this implies reconstruction from $o(n)$ measurements. The result is robust with respect to noise, does not apply uniquely to random signals, but requires the knowledge of the empirical distribution of the signal $p_X$., Comment: 60 pages, 7 figures, Sections 3,5 and Appendices A,B are added. The stability constant is quantified (cf Theorem 1.7)
- Published
- 2013
- Full Text
- View/download PDF
40. State evolution for general approximate message passing algorithms, with applications to spatial coupling
- Author
-
Adel Javanmard and Andrea Montanari
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Independent and identically distributed random variables ,Numerical Analysis ,Class (set theory) ,Computer science ,Computer Science - Information Theory ,Information Theory (cs.IT) ,Applied Mathematics ,Gaussian ,Probability (math.PR) ,Message passing ,Recursion (computer science) ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,Coupling (probability) ,State evolution ,symbols.namesake ,Compressed sensing ,Computational Theory and Mathematics ,FOS: Mathematics ,symbols ,Algorithm ,Mathematics - Probability ,Analysis - Abstract
We consider a class of approximated message passing (AMP) algorithms and characterize their high-dimensional behavior in terms of a suitable state evolution recursion. Our proof applies to Gaussian matrices with independent but not necessarily identically distributed entries. It covers --in particular-- the analysis of generalized AMP, introduced by Rangan, and of AMP reconstruction in compressed sensing with spatially coupled sensing matrices. The proof technique builds on the one of [BM11], while simplifying and generalizing several steps., Comment: 29 pages, 1 figure, minor updates in citations
- Published
- 2013
- Full Text
- View/download PDF
41. Optimal Coding for the Binary Deletion Channel With Small Deletion Probability
- Author
-
Yashodhan Kanoria and Andrea Montanari
- Subjects
Discrete mathematics ,Binary number ,Library and Information Sciences ,Binary erasure channel ,Binary symmetric channel ,Computer Science Applications ,Bernoulli's principle ,Channel capacity ,Deletion channel ,Series expansion ,Algorithm ,Computer Science::Information Theory ,Information Systems ,Mathematics ,Communication channel - Abstract
The binary deletion channel is the simplest point-to-point communication channel that models lack of synchronization. Input bits are deleted independently with probability d, and when they are not deleted, they are not affected by the channel. Despite significant effort, little is known about the capacity of this channel and even less about optimal coding schemes. In this paper, we develop a new systematic approach to this problem, by demonstrating that capacity can be computed in a series expansion for small deletion probability. We compute three leading terms of this expansion, and find an input distribution that achieves capacity up to this order. This constitutes the first optimal random coding result for the deletion channel. The key idea employed is the following: We understand perfectly the deletion channel with deletion probability d=0. It has capacity 1 and the optimal input distribution is iid Bernoulli (1/2). It is natural to expect that the channel with small deletion probabilities has a capacity that varies smoothly with d, and that the optimal input distribution is obtained by smoothly perturbing the iid Bernoulli (1/2) process. Our results show that this is indeed the case.
- Published
- 2013
- Full Text
- View/download PDF
42. Accurate Prediction of Phase Transitions in Compressed Sensing via a Connection to Minimax Denoising
- Author
-
Iain M. Johnstone, David L. Donoho, and Andrea Montanari
- Subjects
Mathematical optimization ,Mean squared error ,Signal reconstruction ,Gaussian ,020206 networking & telecommunications ,02 engineering and technology ,Library and Information Sciences ,Minimax ,01 natural sciences ,Thresholding ,Computer Science Applications ,010104 statistics & probability ,symbols.namesake ,Compressed sensing ,Undersampling ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,Applied mathematics ,0101 mathematics ,Gaussian process ,Information Systems ,Mathematics - Abstract
Compressed sensing posits that, within limits, one can undersample a sparse signal and yet reconstruct it accurately. Knowing the precise limits to such undersampling is important both for theory and practice. We present a formula that characterizes the allowed undersampling of generalized sparse objects. The formula applies to approximate message passing (AMP) algorithms for compressed sensing, which are here generalized to employ denoising operators besides the traditional scalar soft thresholding denoiser. This paper gives several examples including scalar denoisers not derived from convex penalization-the firm shrinkage nonlinearity and the minimax nonlinearity-and also nonscalar denoisers-block thresholding, monotone regression, and total variation minimization. Let the variables e = k/N and δ = n/N denote the generalized sparsity and undersampling fractions for sampling the k-generalized-sparse N-vector x0 according to y=Ax0. Here, A is an n×N measurement matrix whose entries are iid standard Gaussian. The formula states that the phase transition curve δ = δ(e) separating successful from unsuccessful reconstruction of x0 by AMP is given by δ = M(e|Denoiser) where M(e|Denoiser) denotes the per-coordinate minimax mean squared error (MSE) of the specified, optimally tuned denoiser in the directly observed problem y = x + z. In short, the phase transition of a noiseless undersampling problem is identical to the minimax MSE in a denoising problem. We prove that this formula follows from state evolution and present numerical results validating it in a wide range of settings. The above formula generates numerous new insights, both in the scalar and in the nonscalar cases.
- Published
- 2013
- Full Text
- View/download PDF
43. On the concentration of the number of solutions of random satisfiability formulas
- Author
-
Andrea Montanari and Emmanuel Abbe
- Subjects
FOS: Computer and information sciences ,Computer Science - Logic in Computer Science ,Class (set theory) ,Discrete Mathematics (cs.DM) ,General Mathematics ,Existential quantification ,FOS: Physical sciences ,0102 computer and information sciences ,Computational Complexity (cs.CC) ,01 natural sciences ,Combinatorics ,FOS: Mathematics ,Countable set ,0101 mathematics ,Condensed Matter - Statistical Mechanics ,Constraint satisfaction problem ,Mathematics ,Discrete mathematics ,High probability ,Statistical Mechanics (cond-mat.stat-mech) ,Applied Mathematics ,Probability (math.PR) ,010102 general mathematics ,Function (mathematics) ,Computer Graphics and Computer-Aided Design ,Satisfiability ,Logic in Computer Science (cs.LO) ,Computer Science - Computational Complexity ,010201 computation theory & mathematics ,struct ,Mathematics - Probability ,Software ,Computer Science - Discrete Mathematics - Abstract
Let $Z(F)$ be the number of solutions of a random $k$-satisfiability formula $F$ with $n$ variables and clause density $\alpha$. Assume that the probability that $F$ is unsatisfiable is $O(1/\log(n)^{1+\e})$ for $\e>0$. We show that (possibly excluding a countable set of `exceptional' $\alpha$'s) the number of solutions concentrate in the logarithmic scale, i.e., there exists a non-random function $\phi(\alpha)$ such that, for any $\delta>0$, $(1/n)\log Z(F)\in [\phi-\delta,\phi+\delta]$ with high probability. In particular, the assumption holds for all $\alpha
- Published
- 2013
- Full Text
- View/download PDF
44. Iterative Coding for Network Coding
- Author
-
R. Urbanke and Andrea Montanari
- Subjects
probabilistic channel models ,Theoretical computer science ,Shannon channel capacity ,Iterative method ,Code word ,Data_CODINGANDINFORMATIONTHEORY ,Library and Information Sciences ,Computer Science Applications ,Channel capacity ,Network coding ,Linear network coding ,Header ,sparse graph codes ,Communication complexity ,Decoding methods ,Computer Science::Information Theory ,Information Systems ,Coding (social sciences) ,Mathematics - Abstract
We consider communication over a noisy network under randomized linear network coding. Possible error mechanisms include node- or link-failures, Byzantine behavior of nodes, or an overestimate of the network min-cut. Building on the work of Kotter and Kschischang, we introduce a systematic oblivious random channel model. Within this model, codewords contain a header (this is the systematic part). The header effectively records the coefficients of the linear encoding functions, thus simplifying the decoding task. Under this constraint, errors are modeled as random low-rank perturbations of the transmitted codeword. We compute the capacity of this channel and we define an error-correction scheme based on random sparse graphs and a low-complexity decoding algorithm. By optimizing over the code degree profile, we show that this construction achieves the channel capacity in complexity which is jointly quadratic in the number of coded information bits and sublogarithmic in the error probability.
- Published
- 2013
- Full Text
- View/download PDF
45. Phase transitions in semidefinite relaxations
- Author
-
Adel Javanmard, Federico Ricci-Tersenghi, and Andrea Montanari
- Subjects
FOS: Computer and information sciences ,Theoretical computer science ,Optimization problem ,Discrete Mathematics (cs.DM) ,Statistical noise ,Computer Science - Information Theory ,FOS: Physical sciences ,02 engineering and technology ,01 natural sciences ,010104 statistics & probability ,Commentaries ,Synchronization (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Statistical inference ,Community detection ,Phase transitions ,Semidefinite programming ,Synchronization ,Multidisciplinary ,0101 mathematics ,Condensed Matter - Statistical Mechanics ,Mathematics ,Statistical Mechanics (cond-mat.stat-mech) ,Information Theory (cs.IT) ,020206 networking & telecommunications ,Statistical mechanics ,Range (mathematics) ,Computer Science - Discrete Mathematics ,Curse of dimensionality - Abstract
Statistical inference problems arising within signal processing, data mining, and machine learning naturally give rise to hard combinatorial optimization problems. These problems become intractable when the dimensionality of the data is large, as is often the case for modern datasets. A popular idea is to construct convex relaxations of these combinatorial problems, which can be solved efficiently for large scale datasets. Semidefinite programming (SDP) relaxations are among the most powerful methods in this family, and are surprisingly well-suited for a broad range of problems where data take the form of matrices or graphs. It has been observed several times that, when the `statistical noise' is small enough, SDP relaxations correctly detect the underlying combinatorial structures. In this paper we develop asymptotic predictions for several `detection thresholds,' as well as for the estimation error above these thresholds. We study some classical SDP relaxations for statistical problems motivated by graph synchronization and community detection in networks. We map these optimization problems to statistical mechanics models with vector spins, and use non-rigorous techniques from statistical mechanics to characterize the corresponding phase transitions. Our results clarify the effectiveness of SDP relaxations in solving high-dimensional statistical problems., Comment: 71 pages, 24 pdf figures
- Published
- 2016
- Full Text
- View/download PDF
46. The LASSO Risk for Gaussian Matrices
- Author
-
Mohsen Bayati and Andrea Montanari
- Subjects
FOS: Computer and information sciences ,Mathematical optimization ,Mean squared error ,Computer Science - Information Theory ,Gaussian ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,02 engineering and technology ,Library and Information Sciences ,01 natural sciences ,010104 statistics & probability ,symbols.namesake ,Lasso (statistics) ,FOS: Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,Applied mathematics ,Limit (mathematics) ,0101 mathematics ,Gaussian process ,Mathematics ,Information Theory (cs.IT) ,Estimator ,020206 networking & telecommunications ,16. Peace & justice ,Computer Science Applications ,Basis pursuit denoising ,symbols ,Random matrix ,Information Systems - Abstract
We consider the problem of learning a coefficient vector x_0\in R^N from noisy linear observation y=Ax_0+w \in R^n. In many contexts (ranging from model selection to image processing) it is desirable to construct a sparse estimator x'. In this case, a popular approach consists in solving an L1-penalized least squares problem known as the LASSO or Basis Pursuit DeNoising (BPDN). For sequences of matrices A of increasing dimensions, with independent gaussian entries, we prove that the normalized risk of the LASSO converges to a limit, and we obtain an explicit expression for this limit. Our result is the first rigorous derivation of an explicit formula for the asymptotic mean square error of the LASSO for random instances. The proof technique is based on the analysis of AMP, a recently developed efficient algorithm, that is inspired from graphical models ideas. Simulations on real data matrices suggest that our results can be relevant in a broad array of practical applications., 43 pages, 5 figures (v3 rectifies some inconsistencies in the formulation of auxiliary lemmas)
- Published
- 2012
- Full Text
- View/download PDF
47. Applications of the Lindeberg Principle in Communications and Statistical Learning
- Author
-
Andrea Montanari and Satish Babu Korada
- Subjects
Theoretical computer science ,Covariance matrix ,MIMO ,Symmetric matrix ,Library and Information Sciences ,Information theory ,Random matrix ,Random variable ,Computer Science Applications ,Information Systems ,Universality (dynamical systems) ,Mathematics ,Sparse matrix - Abstract
We use a generalization of the Lindeberg principle developed by S. Chatterjee to prove universality properties for various problems in communications, statistical learning and random matrix theory. We also show that these systems can be viewed as the limiting case of a properly defined sparse system. The latter result is useful when the sparse systems are easier to analyze than their dense counterparts. The list of problems we consider is by no means exhaustive. We believe that the ideas can be used in many other problems relevant for information theory.
- Published
- 2011
- Full Text
- View/download PDF
48. The Dynamics of Message Passing on Dense Graphs, with Applications to Compressed Sensing
- Author
-
Andrea Montanari and Mohsen Bayati
- Subjects
FOS: Computer and information sciences ,Independent and identically distributed random variables ,Theoretical computer science ,Iterative method ,Computer science ,Computer Science - Information Theory ,Gaussian ,Mathematics - Statistics Theory ,Context (language use) ,Statistics Theory (math.ST) ,02 engineering and technology ,Library and Information Sciences ,01 natural sciences ,Machine Learning (cs.LG) ,010104 statistics & probability ,symbols.namesake ,Matrix (mathematics) ,FOS: Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,0101 mathematics ,Gaussian process ,Information Theory (cs.IT) ,010102 general mathematics ,Message passing ,Approximation algorithm ,020206 networking & telecommunications ,Graph theory ,Graph ,Computer Science Applications ,Computer Science - Learning ,Compressed sensing ,symbols ,Algorithm design ,Random matrix ,Algorithm ,Factor graph ,Information Systems - Abstract
Approximate message passing algorithms proved to be extremely effective in reconstructing sparse signals from a small number of incoherent linear measurements. Extensive numerical experiments further showed that their dynamics is accurately tracked by a simple one-dimensional iteration termed state evolution. In this paper we provide the first rigorous foundation to state evolution. We prove that indeed it holds asymptotically in the large system limit for sensing matrices with independent and identically distributed gaussian entries. While our focus is on message passing algorithms for compressed sensing, the analysis extends beyond this setting, to a general class of algorithms on dense graphs. In this context, state evolution plays the role that density evolution has for sparse graphs. The proof technique is fundamentally different from the standard approach to density evolution, in that it copes with large number of short loops in the underlying factor graph. It relies instead on a conditioning technique recently developed by Erwin Bolthausen in the context of spin glass theory., Comment: 41 pages
- Published
- 2011
- Full Text
- View/download PDF
49. The weak limit of Ising models on locally tree-like graphs
- Author
-
Andrea Montanari, Allan Sly, and Elchanan Mossel
- Subjects
Statistics and Probability ,Combinatorics ,Spins ,Convergence (routing) ,Ising model ,Square-lattice Ising model ,Boundary value problem ,Limit (mathematics) ,Statistics, Probability and Uncertainty ,Measure (mathematics) ,Tree (graph theory) ,Analysis ,Mathematics - Abstract
We consider the Ising model with inverse temperature β and without external field on sequences of graphs Gn which converge locally to the k-regular tree. We show that for such graphs the Ising measure locally weakly converges to the symmetric mixture of the Ising model with + boundary conditions and the − boundary conditions on the k-regular tree with inverse temperature β. In the case where the graphs Gn are expanders we derive a more detailed understanding by showing convergence of the Ising measure conditional on positive magnetization (sum of spins) to the + measure on the tree.
- Published
- 2010
- Full Text
- View/download PDF
50. Matrix Completion From a Few Entries
- Author
-
Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh
- Subjects
FOS: Computer and information sciences ,Machine Learning (stat.ML) ,010103 numerical & computational mathematics ,02 engineering and technology ,Library and Information Sciences ,01 natural sciences ,Machine Learning (cs.LG) ,Matrix decomposition ,Combinatorics ,Matrix (mathematics) ,Statistics - Machine Learning ,0202 electrical engineering, electronic engineering, information engineering ,Rank (graph theory) ,0101 mathematics ,Time complexity ,Mathematics ,Sparse matrix ,Matrix completion ,Spectrum (functional analysis) ,020206 networking & telecommunications ,Computer Science Applications ,Computer Science - Learning ,Bounded function ,020201 artificial intelligence & image processing ,Random matrix ,Information Systems - Abstract
Let M be a random (alpha n) x n matrix of rank r<, Comment: 30 pages, 1 figure, journal version (v1, v2: Conference version ISIT 2009)
- Published
- 2010
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.