Back to Search
Start Over
Convergence of graph Laplacian with kNN self-tuned kernels
- Source :
- Information and Inference: A Journal of the IMA. 11:889-957
- Publication Year :
- 2021
- Publisher :
- Oxford University Press (OUP), 2021.
-
Abstract
- Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {\sigma ^2} ) $ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $\sigma $, and a common practice called self-tuned kernel adaptively sets a $\sigma _i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance. When $x_i$s are sampled from a $d$-dimensional manifold embedded in a possibly high-dimensional space, unlike with fixed-bandwidth kernels, theoretical results of graph Laplacian convergence with self-tuned kernels have been incomplete. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels $W^{(\alpha )}_{ij} = k_0( \frac{ \| x_i - x_j \|^2}{ \epsilon \hat{\rho }(x_i) \hat{\rho }(x_j)})/\hat{\rho }(x_i)^\alpha \hat{\rho }(x_j)^\alpha $, where $\hat{\rho }$ is the estimated bandwidth function by kNN and the limiting operator is also parametrized by $\alpha $. When $\alpha = 1$, the limiting operator is the weighted manifold Laplacian $\varDelta _p$. Specifically, we prove the point-wise convergence of $L_N f $ and convergence of the graph Dirichlet form with rates. Our analysis is based on first establishing a $C^0$ consistency for $\hat{\rho }$ which bounds the relative estimation error $|\hat{\rho } - \bar{\rho }|/\bar{\rho }$ uniformly with high probability, where $\bar{\rho } = p^{-1/d}$ and $p$ is the data density function. Our theoretical results reveal the advantage of the self-tuned kernel over the fixed-bandwidth kernel via smaller variance error in low-density regions. In the algorithm, no prior knowledge of $d$ or data density is needed. The theoretical results are supported by numerical experiments on simulated data and hand-written digit image data.
- Subjects :
- FOS: Computer and information sciences
Statistics and Probability
Physics
Computer Science - Machine Learning
Numerical Analysis
Dirichlet form
Applied Mathematics
Image (category theory)
Mathematics - Statistics Theory
Machine Learning (stat.ML)
Statistics Theory (math.ST)
Function (mathematics)
Kernel Bandwidth
Machine Learning (cs.LG)
Combinatorics
Computational Theory and Mathematics
Statistics - Machine Learning
FOS: Mathematics
Laplacian matrix
Laplace operator
Analysis
Kernel (category theory)
Gramian matrix
Subjects
Details
- ISSN :
- 20498772
- Volume :
- 11
- Database :
- OpenAIRE
- Journal :
- Information and Inference: A Journal of the IMA
- Accession number :
- edsair.doi.dedup.....218218ad9e814e79186c303b48bc9000
- Full Text :
- https://doi.org/10.1093/imaiai/iaab019