Start Over

Detection Thresholds in Very Sparse Matrix Completion

Authors :: Bordenave, Charles
Coste, Simon
Nadakuditi, Raj Rao
Source :: Foundations of Computational Mathematics. October, 2023, Vol. 23 Issue 5, p1619, 125 p.
Publication Year :: 2023
Abstract: We study the matrix completion problem: an underlying [Formula omitted] matrix P is low rank, with incoherent singular vectors, and a random [Formula omitted] matrix A is equal to P on a (uniformly) random subset of entries of size dn. All other entries of A are equal to zero. The goal is to retrieve information on P from the observation of A. Let [Formula omitted] be the random matrix where each entry of A is multiplied by an independent [Formula omitted]-Bernoulli random variable with parameter 1/2. This paper is about when, how and why the non-Hermitian eigen-spectra of the matrices [Formula omitted] and [Formula omitted] captures more of the relevant information about the principal component structure of A than the eigen-spectra of [Formula omitted] and [Formula omitted]. We show that the eigenvalues of the asymmetric matrices [Formula omitted] and [Formula omitted] with modulus greater than a detection threshold are asymptotically equal to the eigenvalues of [Formula omitted] and [Formula omitted] and that the associated eigenvectors are aligned as well. The central surprise is that by intentionally inducing asymmetry and additional randomness via the [Formula omitted] matrix, we can extract more information than if we had worked with the singular value decomposition (SVD) of A. The associated detection threshold is asymptotically exact and is non-universal since it explicitly depends on the element-wise distribution of the underlying matrix P. We show that reliable, statistically optimal but not perfect matrix recovery, via a universal data-driven algorithm, is possible above this detection threshold using the information extracted from the asymmetric eigen-decompositions. Averaging the left and right eigenvectors provably improves estimation accuracy but not the detection threshold. Our results encompass the very sparse regime where d is of order 1 where matrix completion via the SVD of A fails or produces unreliable recovery. We define another variant of this asymmetric principal component analysis procedure that bypasses the randomization step and has a detection threshold that is smaller by a constant factor but with a computational cost that is larger by a polynomial factor of the number of observed entries. Both detection thresholds allow to go beyond the barrier due to the well-known information theoretical limit [Formula omitted] for exact matrix completion found in the literature.<br />Author(s): Charles Bordenave [sup.1] , Simon Coste [sup.2] , Raj Rao Nadakuditi [sup.3] Author Affiliations: (1) grid.5399.6, 0000 0001 2176 4817, Institut de mathématiques de Marseille, Aix-Marseille Université, , 39 [...]