Back to Search
Start Over
Non-negative Principal Component Analysis: Message Passing Algorithms and Sharp Asymptotics
- Publication Year :
- 2014
-
Abstract
- Principal component analysis (PCA) aims at estimating the direction of maximal variability of a high-dimensional dataset. A natural question is: does this task become easier, and estimation more accurate, when we exploit additional knowledge on the principal vector? We study the case in which the principal vector is known to lie in the positive orthant. Similar constraints arise in a number of applications, ranging from analysis of gene expression data to spike sorting in neural signal processing. In the unconstrained case, the estimation performances of PCA has been precisely characterized using random matrix theory, under a statistical model known as the `spiked model.' It is known that the estimation error undergoes a phase transition as the signal-to-noise ratio crosses a certain threshold. Unfortunately, tools from random matrix theory have no bearing on the constrained problem. Despite this challenge, we develop an analogous characterization in the constrained case, within a one-spike model. In particular: $(i)$~We prove that the estimation error undergoes a similar phase transition, albeit at a different threshold in signal-to-noise ratio that we determine exactly; $(ii)$~We prove that --unlike in the unconstrained case-- estimation error depends on the spike vector, and characterize the least favorable vectors; $(iii)$~We show that a non-negative principal component can be approximately computed --under the spiked model-- in nearly linear time. This despite the fact that the problem is non-convex and, in general, NP-hard to solve exactly.<br />51 pages, 7 pdf figures
- Subjects :
- FOS: Computer and information sciences
Information Theory (cs.IT)
Computer Science - Information Theory
Sparse PCA
020206 networking & telecommunications
Statistical model
Mathematics - Statistics Theory
02 engineering and technology
Statistics Theory (math.ST)
Library and Information Sciences
01 natural sciences
Computer Science Applications
Orthant
010104 statistics & probability
Spike sorting
Principal component analysis
0202 electrical engineering, electronic engineering, information engineering
FOS: Mathematics
Symmetric matrix
0101 mathematics
Random matrix
Algorithm
Time complexity
Information Systems
Mathematics
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....96a73ccb655fb9e6a6725b7e9f3b4ac0