Author: "Lu, Xiliang" / Database: OAIster - Searchworks@Jio Institute Digital Library Search Results

1. Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction

Author: Xu, Ziyang, Peng, Keqin, Ding, Liang, Tao, Dacheng, Lu, Xiliang, Xu, Ziyang, Peng, Keqin, Ding, Liang, Tao, Dacheng, and Lu, Xiliang
Abstract: Recent research shows that pre-trained language models (PLMs) suffer from "prompt bias" in factual knowledge extraction, i.e., prompts tend to introduce biases toward specific labels. Prompt bias presents a significant challenge in assessing the factual knowledge within PLMs. Therefore, this paper aims to improve the reliability of existing benchmarks by thoroughly investigating and mitigating prompt bias. We show that: 1) all prompts in the experiments exhibit non-negligible bias, with gradient-based prompts like AutoPrompt and OptiPrompt displaying significantly higher levels of bias; 2) prompt bias can amplify benchmark accuracy unreasonably by overfitting the test datasets, especially on imbalanced datasets like LAMA. Based on these findings, we propose a representation-based approach to mitigate the prompt bias during inference time. Specifically, we first estimate the biased representation using prompt-only querying, and then remove it from the model's internal representations to generate the debiased representations, which are used to produce the final debiased outputs. Experiments across various prompts, PLMs, and benchmarks show that our approach can not only correct the overfitted performance caused by prompt bias, but also significantly improve the prompt retrieval capability (up to 10% absolute performance gain). These results indicate that our approach effectively alleviates prompt bias in knowledge evaluation, thereby enhancing the reliability of benchmark assessments. Hopefully, our plug-and-play approach can be a golden standard to strengthen PLMs toward reliable knowledge bases. Code and data are released in https://github.com/FelliYang/PromptBias., Comment: Accepted by COLING 2024
Published: 2024

2. Numerical Recovery of the Diffusion Coefficient in Diffusion Equations from Terminal Measurement

Author: Jin, Bangti, Lu, Xiliang, Quan, Qimeng, Zhou, Zhi, Jin, Bangti, Lu, Xiliang, Quan, Qimeng, and Zhou, Zhi
Abstract: In this work, we investigate a numerical procedure for recovering a space-dependent diffusion coefficient in a (sub)diffusion model from the given terminal data, and provide a rigorous numerical analysis of the procedure. By exploiting decay behavior of the observation in time, we establish a novel H{\"o}lder type stability estimate for a large terminal time $T$. This is achieved by novel decay estimates of the (fractional) time derivative of the solution. To numerically recover the diffusion coefficient, we employ the standard output least-squares formulation with an $H^1(\Omega)$-seminorm penalty, and discretize the regularized problem by the Galerkin finite element method with continuous piecewise linear finite elements in space and backward Euler convolution quadrature in time. Further, we provide an error analysis of discrete approximations, and prove a convergence rate that matches the stability estimate. The derived $L^2(\Omega)$ error bound depends explicitly on the noise level, regularization parameter and discretization parameter(s), which gives a useful guideline of the \textsl{a priori} choice of discretization parameters with respect to the noise level in practical implementation. The error analysis is achieved using the conditional stability argument and discrete maximum-norm resolvent estimates. Several numerical experiments are also given to illustrate and complement the theoretical analysis., Comment: 22 pages, 2 figures
Published: 2024

3. On the robustness of double-word addition algorithms

Author: Yang, Yuanyuan, Lyu, XinYu, He, Sida, Lu, Xiliang, Qi, Ji, Li, Zhihao, Yang, Yuanyuan, Lyu, XinYu, He, Sida, Lu, Xiliang, Qi, Ji, and Li, Zhihao
Abstract: We demonstrate that, even when there are moderate overlaps in the inputs of sloppy or accurate double-word addition algorithms in the QD library, these algorithms still guarantee error bounds of $O(u^2(|a|+|b|))$ in faithful rounding. Furthermore, the accurate algorithm can achieve a relative error bound of $O(u^2)$ in the presence of moderate overlaps in the inputs when rounding function is round-to-nearest. The relative error bound also holds in directed rounding, but certain additional conditions are required. Consequently, in double-word multiplication and addition operations, we can safely omit the normalization step of double-word multiplication and replace the accurate addition algorithm with the sloppy one. Numerical experiments confirm that this approach nearly doubles the performance of double-word multiplication and addition operations, with negligible precision costs. Moreover, in directed rounding mode, the signs of the errors of the two algorithms are consistent with the rounding direction, even in the presence of input overlap. This allows us to avoid changing the rounding mode in interval arithmetic. We also prove that the relative error bound of the sloppy addition algorithm exceeds $3u^2$ if and only if the input meets the condition of Sterbenz's Lemma when rounding to nearest. These findings suggest that the two addition algorithms are more robust than previously believed.
Published: 2024

4. Deep Neural Network Approximation of Composition Functions: with application to PINNs

Author: Duan, Chenguang, Jiao, Yuling, Lu, Xiliang, Yang, Jerry Zhijian, Yuan, Cheng, Duan, Chenguang, Jiao, Yuling, Lu, Xiliang, Yang, Jerry Zhijian, and Yuan, Cheng
Abstract: In this paper, we focus on approximating a natural class of functions that are compositions of smooth functions. Unlike the low-dimensional support assumption on the covariate, we demonstrate that composition functions have an intrinsic sparse structure if we assume each layer in the composition has a small degree of freedom. This fact can alleviate the curse of dimensionality in approximation errors by neural networks. Specifically, by using mathematical induction and the multivariate Faa di Bruno formula, we extend the approximation theory of deep neural networks to the composition functions case. Furthermore, combining recent results on the statistical error of deep learning, we provide a general convergence rate analysis for the PINNs method in solving elliptic equations with compositional solutions. We also present two simple illustrative numerical examples to demonstrate the effect of the intrinsic sparse structure in regression and solving PDEs., Comment: There are errors in the crucial Lemma 3.1, which is a result from our previous work that has not undergone peer review. During the refinement of this manuscript, one of our colleagues pointed out a potential mistake in the proof of this result, indicating that certain corrections are needed to ensure its correctness. To uphold academic rigor, we decide to withdraw the paper at this time
Published: 2023

5. GAS: A Gaussian Mixture Distribution-Based Adaptive Sampling Method for PINNs

Author: Jiao, Yuling, Li, Di, Lu, Xiliang, Yang, Jerry Zhijian, Yuan, Cheng, Jiao, Yuling, Li, Di, Lu, Xiliang, Yang, Jerry Zhijian, and Yuan, Cheng
Abstract: With the recent study of deep learning in scientific computation, the Physics-Informed Neural Networks (PINNs) method has drawn widespread attention for solving Partial Differential Equations (PDEs). Compared to traditional methods, PINNs can efficiently handle high-dimensional problems, but the accuracy is relatively low, especially for highly irregular problems. Inspired by the idea of adaptive finite element methods and incremental learning, we propose GAS, a Gaussian mixture distribution-based adaptive sampling method for PINNs. During the training procedure, GAS uses the current residual information to generate a Gaussian mixture distribution for the sampling of additional points, which are then trained together with historical data to speed up the convergence of the loss and achieve higher accuracy. Several numerical simulations on 2D and 10D problems show that GAS is a promising method that achieves state-of-the-art accuracy among deep solvers, while being comparable with traditional numerical solvers.
Published: 2023

6. Neural Network Approximation for Pessimistic Offline Reinforcement Learning

Author: Wu, Di, Jiao, Yuling, Shen, Li, Yang, Haizhao, Lu, Xiliang, Wu, Di, Jiao, Yuling, Shen, Li, Yang, Haizhao, and Lu, Xiliang
Abstract: Deep reinforcement learning (RL) has shown remarkable success in specific offline decision-making scenarios, yet its theoretical guarantees are still under development. Existing works on offline RL theory primarily emphasize a few trivial settings, such as linear MDP or general function approximation with strong assumptions and independent data, which lack guidance for practical use. The coupling of deep learning and Bellman residuals makes this problem challenging, in addition to the difficulty of data dependence. In this paper, we establish a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation with $\mathcal{C}$-mixing data regarding the structure of networks, the dimension of datasets, and the concentrability of data coverage, under mild assumptions. Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight. This result demonstrates the explicit efficiency of deep adversarial offline RL frameworks. We utilize the empirical process tool for $\mathcal{C}$-mixing sequences and the neural network approximation theory for the H\"{o}lder class to achieve this. We also develop methods to bound the Bellman estimation error caused by function approximation with empirical Bellman constraint perturbations. Additionally, we present a result that lessens the curse of dimensionality using data with low intrinsic dimensionality and function classes with low complexity. Our estimation provides valuable insights into the development of deep offline RL and guidance for algorithm model design., Comment: Full version of the paper accepted to the 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)
Published: 2023

7. Provable Advantage of Parameterized Quantum Circuit in Function Approximation

Author: Yu, Zhan, Chen, Qiuhao, Jiao, Yuling, Li, Yinan, Lu, Xiliang, Wang, Xin, Yang, Jerry Zhijian, Yu, Zhan, Chen, Qiuhao, Jiao, Yuling, Li, Yinan, Lu, Xiliang, Wang, Xin, and Yang, Jerry Zhijian
Abstract: Understanding the power of parameterized quantum circuits (PQCs) in accomplishing machine learning tasks is one of the most important questions in quantum machine learning. In this paper, we analyze the expressivity of PQCs through the lens of function approximation. Previously established universal approximation theorems for PQCs are mainly nonconstructive, leading us to the following question: How large do the PQCs need to be to approximate the target function up to a given error? We exhibit explicit constructions of data re-uploading PQCs for approximating continuous and smooth functions and establish quantitative approximation error bounds in terms of the width, the depth and the number of trainable parameters of the PQCs. To achieve this, we utilize techniques from quantum signal processing and linear combinations of unitaries to construct PQCs that implement multivariate polynomials. We implement global and local approximation techniques using Bernstein polynomials and local Taylor expansion and analyze their performances in the quantum setting. We also compare our proposed PQCs to nearly optimal deep neural networks in approximating high-dimensional smooth functions, showing that the ratio between model sizes of PQC and deep neural networks is exponentially small with respect to the input dimension. This suggests a potentially novel avenue for showcasing quantum advantages in quantum machine learning., Comment: 31pages, 3 figures
Published: 2023

8. Current density impedance imaging with PINNs

Author: Duan, Chenguang, Jiao, Yuling, Lu, Xiliang, Yang, Jerry Zhijian, Duan, Chenguang, Jiao, Yuling, Lu, Xiliang, and Yang, Jerry Zhijian
Abstract: In this paper, we introduce CDII-PINNs, a computationally efficient method for solving CDII using PINNs in the framework of Tikhonov regularization. This method constructs a physics-informed loss function by merging the regularized least-squares output functional with an underlying differential equation, which describes the relationship between the conductivity and voltage. A pair of neural networks representing the conductivity and voltage, respectively, are coupled by this loss function. Then, minimizing the loss function provides a reconstruction. A rigorous theoretical guarantee is provided. We give an error analysis for CDII-PINNs and establish a convergence rate, based on prior selected neural network parameters in terms of the number of samples. The numerical simulations demonstrate that CDII-PINNs are efficient, accurate and robust to noise levels ranging from $1\%$ to $20\%$.
Published: 2023

9. Stochastic mirror descent method for linear ill-posed problems in Banach spaces

Author: Jin, Qinian, Lu, Xiliang, Zhang, Liuying, Jin, Qinian, Lu, Xiliang, and Zhang, Liuying
Abstract: Consider linear ill-posed problems governed by the system $A_i x = y_i$ for $i =1, \cdots, p$, where each $A_i$ is a bounded linear operator from a Banach space $X$ to a Hilbert space $Y_i$. In case $p$ is huge, solving the problem by an iterative regularization method using the whole information at each iteration step can be very expensive, due to the huge amount of memory and excessive computational load per iteration. To solve such large-scale ill-posed systems efficiently, we develop in this paper a stochastic mirror descent method which uses only a small portion of equations randomly selected at each iteration steps and incorporates convex regularization terms into the algorithm design. Therefore, our method scales very well with the problem size and has the capability of capturing features of sought solutions. The convergence property of the method depends crucially on the choice of step-sizes. We consider various rules for choosing step-sizes and obtain convergence results under {\it a priori} early stopping rules. In particular, by incorporating the spirit of the discrepancy principle we propose a choice rule of step-sizes which can efficiently suppress the oscillations in iterates and reduce the effect of semi-convergence. Furthermore, we establish an order optimal convergence rate result when the sought solution satisfies a benchmark source condition. Various numerical simulations are reported to test the performance of the method.
Published: 2022
Full Text: View/download PDF

10. Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning

Author: Chen, Qiuhao, Du, Yuxuan, Zhao, Qi, Jiao, Yuling, Lu, Xiliang, Wu, Xingyao, Chen, Qiuhao, Du, Yuxuan, Zhao, Qi, Jiao, Yuling, Lu, Xiliang, and Wu, Xingyao
Abstract: Efficient quantum compiling tactics greatly enhance the capability of quantum computers to execute complicated quantum algorithms. Due to its fundamental importance, a plethora of quantum compilers has been designed in past years. However, there are several caveats to current protocols, which are low optimality, high inference time, limited scalability, and lack of universality. To compensate for these defects, here we devise an efficient and practical quantum compiler assisted by advanced deep reinforcement learning (RL) techniques, i.e., data generation, deep Q-learning, and AQ* search. In this way, our protocol is compatible with various quantum machines and can be used to compile multi-qubit operators. We systematically evaluate the performance of our proposal in compiling quantum operators with both inverse-closed and inverse-free universal basis sets. In the task of single-qubit operator compiling, our proposal outperforms other RL-based quantum compilers in the measure of compiling sequence length and inference time. Meanwhile, the output solution is near-optimal, guaranteed by the Solovay-Kitaev theorem. Notably, for the inverse-free universal basis set, the achieved sequence length complexity is comparable with the inverse-based setting and dramatically advances previous methods. These empirical results contribute to improving the inverse-free Solovay-Kitaev theorem. In addition, for the first time, we demonstrate how to leverage RL-based quantum compilers to accomplish two-qubit operator compiling. The achieved results open an avenue for integrating RL with quantum compiling to unify efficiency and practicality and thus facilitate the exploration of quantum advantages.
Published: 2022

11. Imaging Conductivity from Current Density Magnitude using Neural Networks

Author: Jin, Bangti, Li, Xiyao, Lu, Xiliang, Jin, Bangti, Li, Xiyao, and Lu, Xiliang
Abstract: Conductivity imaging represents one of the most important tasks in medical imaging. In this work we develop a neural network based reconstruction technique for imaging the conductivity from the magnitude of the internal current density. It is achieved by formulating the problem as a relaxed weighted least-gradient problem, and then approximating its minimizer by standard fully connected feedforward neural networks. We derive bounds on two components of the generalization error, i.e., approximation error and statistical error, explicitly in terms of properties of the neural networks (e.g., depth, total number of parameters, and the bound of the network parameters). We illustrate the performance and distinct features of the approach on several numerical experiments. Numerically, it is observed that the approach enjoys remarkable robustness with respect to the presence of data noise., Comment: 29 pp, 9 figures (several typos are corrected in the new version), to appear at Inverse Problems
Published: 2022
Full Text: View/download PDF

12. Convergence Rate Analysis of Galerkin Approximation of Inverse Potential Problem

Author: Jin, Bangti, Lu, Xiliang, Quan, Qimeng, Zhou, Zhi, Jin, Bangti, Lu, Xiliang, Quan, Qimeng, and Zhou, Zhi
Abstract: In this work we analyze the inverse problem of recovering the space-dependent potential coefficient in an elliptic / parabolic problem from distributed observation. We establish novel (weighted) conditional stability estimates under very mild conditions on the problem data. Then we provide an error analysis of a standard reconstruction scheme based on the standard output least-squares formulation with Tikhonov regularization (by an $H^1$-seminorm penalty), which is then discretized by the Galerkin finite element method with continuous piecewise linear finite elements in space (and also backward Euler method in time for parabolic problems). We present a detailed analysis of the discrete scheme, and provide convergence rates in a weighted $L^2(\Omega)$ for discrete approximations with respect to the exact potential. The error bounds are explicitly dependent on the noise level, regularization parameter and discretization parameter(s). Under suitable conditions, we also derive error estimates in the standard $L^2(\Omega)$ and interior $L^2$ norms. The analysis employs sharp a priori error estimates and nonstandard test functions. Several numerical experiments are given to complement the theoretical analysis., Comment: 23 pages, 4 figures
Published: 2022
Full Text: View/download PDF

13. Imaging Anisotropic Conductivities from Current Densities

Author: Liu, Huan, Jin, Bangti, Lu, Xiliang, Liu, Huan, Jin, Bangti, and Lu, Xiliang
Abstract: In this paper, we propose and analyze a reconstruction algorithm for imaging an anisotropic conductivity tensor in a second-order elliptic PDE with a nonzero Dirichlet boundary condition from internal current densities. It is based on a regularized output least-squares formulation with the standard $L^2(\Omega)^{d,d}$ penalty, which is then discretized by the standard Galerkin finite element method. We establish the continuity and differentiability of the forward map with respect to the conductivity tensor in the $L^p(\Omega)^{d,d}$-norms, the existence of minimizers and optimality systems of the regularized formulation using the concept of H-convergence. Further, we provide a detailed analysis of the discretized problem, especially the convergence of the discrete approximations with respect to the mesh size, using the discrete counterpart of H-convergence. In addition, we develop a projected Newton algorithm for solving the first-order optimality system. We present extensive two-dimensional numerical examples to show the efficiency of the proposed method., Comment: 32 pages, 10 figures, to appear at SIAM Journal on Imaging Sciences
Published: 2022

14. A Rate of Convergence of Physics Informed Neural Networks for the Linear Second Order Elliptic PDEs

Author: Jiao, Yuling, Lai, Yanming, Li, Dingwei, Lu, Xiliang, Wang, Fengru, Wang, Yang, Yang, Jerry Zhijian, Jiao, Yuling, Lai, Yanming, Li, Dingwei, Lu, Xiliang, Wang, Fengru, Wang, Yang, and Yang, Jerry Zhijian
Abstract: In recent years, physical informed neural networks (PINNs) have been shown to be a powerful tool for solving PDEs empirically. However, numerical analysis of PINNs is still missing. In this paper, we prove the convergence rate to PINNs for the second order elliptic equations with Dirichlet boundary condition, by establishing the upper bounds on the number of training samples, depth and width of the deep neural networks to achieve desired accuracy. The error of PINNs is decomposed into approximation error and statistical error, where the approximation error is given in C2 norm with ReLU3 networks (deep network with activation function max{0,x3}) and the statistical error is estimated by Rademacher complexity. We derive the bound on the Rademacher complexity of the non-Lipschitz composition of gradient norm with ReLU3 network, which is of immense independent interest. ©2022 Global-Science Press
Published: 2022

15. Sample-Efficient Sparse Phase Retrieval via Stochastic Alternating Minimization

Author: Cai, Jian-Feng, Jiao, Yuling, Lu, Xiliang, You, Juntao, Cai, Jian-Feng, Jiao, Yuling, Lu, Xiliang, and You, Juntao
Abstract: In this work we propose a nonconvex two-stage \underline{s}tochastic \underline{a}lternating \underline{m}inimizing (SAM) method for sparse phase retrieval. The proposed algorithm is guaranteed to have an exact recovery from $O(s\log n)$ samples if provided the initial guess is in a local neighbour of the ground truth. Thus, the proposed algorithm is two-stage, first we estimate a desired initial guess (e.g. via a spectral method), and then we introduce a randomized alternating minimization strategy for local refinement. Also, the hard-thresholding pursuit algorithm is employed to solve the sparse constraint least square subproblems. We give the theoretical justifications that SAM find the underlying signal exactly in a finite number of iterations (no more than $O(\log m)$ steps) with high probability. Further, numerical experiments illustrates that SAM requires less measurements than state-of-the-art algorithms for sparse phase retrieval problem.
Published: 2021
Full Text: View/download PDF

16. A Data-Driven Line Search Rule for Support Recovery in High-dimensional Data Analysis

Author: Li, Peili, Jiao, Yuling, Lu, Xiliang, Kang, Lican, Li, Peili, Jiao, Yuling, Lu, Xiliang, and Kang, Lican
Abstract: In this work, we consider the algorithm to the (nonlinear) regression problems with $\ell_0$ penalty. The existing algorithms for $\ell_0$ based optimization problem are often carried out with a fixed step size, and the selection of an appropriate step size depends on the restricted strong convexity and smoothness for the loss function, hence it is difficult to compute in practical calculation. In sprite of the ideas of support detection and root finding \cite{HJK2020}, we proposes a novel and efficient data-driven line search rule to adaptively determine the appropriate step size. We prove the $\ell_2$ error bound to the proposed algorithm without much restrictions for the cost functional. A large number of numerical comparisons with state-of-the-art algorithms in linear and logistic regression problems show the stability, effectiveness and superiority of the proposed algorithms.
Published: 2021

17. Analysis of Deep Ritz Methods for Laplace Equations with Dirichlet Boundary Conditions

Author: Duan, Chenguang, Jiao, Yuling, Lai, Yanming, Lu, Xiliang, Quan, Qimeng, Yang, Jerry Zhijian, Duan, Chenguang, Jiao, Yuling, Lai, Yanming, Lu, Xiliang, Quan, Qimeng, and Yang, Jerry Zhijian
Abstract: Deep Ritz methods (DRM) have been proven numerically to be efficient in solving partial differential equations. In this paper, we present a convergence rate in $H^{1}$ norm for deep Ritz methods for Laplace equations with Dirichlet boundary condition, where the error depends on the depth and width in the deep neural networks and the number of samples explicitly. Further we can properly choose the depth and width in the deep neural networks in terms of the number of training samples. The main idea of the proof is to decompose the total error of DRM into three parts, that is approximation error, statistical error and the error caused by the boundary penalty. We bound the approximation error in $H^{1}$ norm with $\mathrm{ReLU}^{2}$ networks and control the statistical error via Rademacher complexity. In particular, we derive the bound on the Rademacher complexity of the non-Lipschitz composition of gradient norm with $\mathrm{ReLU}^{2}$ network, which is of immense independent interest. We also analysis the error inducing by the boundary penalty method and give a prior rule for tuning the penalty parameter., Comment: arXiv admin note: substantial text overlap with arXiv:2103.13330; text overlap with arXiv:2109.01780
Published: 2021

18. Coordinate Descent for MCP/SCAD Penalized Least Squares Converges Linearly

Author: Jiao, Yuling, Li, Dingwei, Liu, Min, Lu, Xiliang, Jiao, Yuling, Li, Dingwei, Liu, Min, and Lu, Xiliang
Abstract: Recovering sparse signals from observed data is an important topic in signal/imaging processing, statistics and machine learning. Nonconvex penalized least squares have been attracted a lot of attentions since they enjoy nice statistical properties. Computationally, coordinate descent (CD) is a workhorse for minimizing the nonconvex penalized least squares criterion due to its simplicity and scalability. In this work, we prove the linear convergence rate to CD for solving MCP/SCAD penalized least squares problems.
Published: 2021

19. A rate of convergence of Physics Informed Neural Networks for the linear second order elliptic PDEs

Author: Jiao, Yuling, Lai, Yanming, Li, Dingwei, Lu, Xiliang, Wang, Fengru, Wang, Yang, Yang, Jerry Zhijian, Jiao, Yuling, Lai, Yanming, Li, Dingwei, Lu, Xiliang, Wang, Fengru, Wang, Yang, and Yang, Jerry Zhijian
Abstract: In recent years, physical informed neural networks (PINNs) have been shown to be a powerful tool for solving PDEs empirically. However, numerical analysis of PINNs is still missing. In this paper, we prove the convergence rate to PINNs for the second order elliptic equations with Dirichlet boundary condition, by establishing the upper bounds on the number of training samples, depth and width of the deep neural networks to achieve desired accuracy. The error of PINNs is decomposed into approximation error and statistical error, where the approximation error is given in $C^2$ norm with $\mathrm{ReLU}^{3}$ networks (deep network with activations function $\max\{0,x^3\}$) and the statistical error is estimated by Rademacher complexity. We derive the bound on the Rademacher complexity of the non-Lipschitz composition of gradient norm with $\mathrm{ReLU}^{3}$ network, which is of immense independent interest., Comment: arXiv admin note: text overlap with arXiv:2103.13330
Published: 2021
Full Text: View/download PDF

20. Convergence Rate Analysis for Deep Ritz Method

Author: Duan, Chenguang, Jiao, Yuling, Lai, Yanming, Lu, Xiliang, Yang, Zhijian, Duan, Chenguang, Jiao, Yuling, Lai, Yanming, Lu, Xiliang, and Yang, Zhijian
Abstract: Using deep neural networks to solve PDEs has attracted a lot of attentions recently. However, why the deep learning method works is falling far behind its empirical success. In this paper, we provide a rigorous numerical analysis on deep Ritz method (DRM) \cite{wan11} for second order elliptic equations with Neumann boundary conditions. We establish the first nonasymptotic convergence rate in $H^1$ norm for DRM using deep networks with $\mathrm{ReLU}^2$ activation functions. In addition to providing a theoretical justification of DRM, our study also shed light on how to set the hyper-parameter of depth and width to achieve the desired convergence rate in terms of number of training samples. Technically, we derive bounds on the approximation error of deep $\mathrm{ReLU}^2$ network in $H^1$ norm and on the Rademacher complexity of the non-Lipschitz composition of gradient norm and $\mathrm{ReLU}^2$ network, both of which are of independent interest.
Published: 2021
Full Text: View/download PDF

21. Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on H\'older Class

Author: Jiao, Yuling, Lai, Yanming, Lu, Xiliang, Wang, Fengru, Yang, Jerry Zhijian, Yang, Yuanyuan, Jiao, Yuling, Lai, Yanming, Lu, Xiliang, Wang, Fengru, Yang, Jerry Zhijian, and Yang, Yuanyuan
Abstract: In this paper, we construct neural networks with ReLU, sine and $2^x$ as activation functions. For general continuous $f$ defined on $[0,1]^d$ with continuity modulus $\omega_f(\cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy an approximation rate $\mathcal{O}(\omega_f(\sqrt{d})\cdot2^{-M}+\omega_{f}\left(\frac{\sqrt{d}}{N}\right))$, where $M,N\in \mathbb{N}^{+}$ denote the hyperparameters related to widths of the networks. As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $\max\left\{\left\lceil2d^{3/2}\left(\frac{3\mu}{\epsilon}\right)^{1/{\alpha}}\right\rceil,2\left\lceil\log_2\frac{3\mu d^{\alpha/2}}{2\epsilon}\right\rceil+2\right\}$ that approximates $f\in \mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ within a given tolerance $\epsilon >0$ measured in $L^p$ norm $p\in[1,\infty)$, where $\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ denotes the H\"older continuous function class defined on $[0,1]^d$ with order $\alpha \in (0,1]$ and constant $\mu > 0$. Therefore, the ReLU-sine-$2^x$ networks overcome the curse of dimensionality on $\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$. In addition to its supper expressive power, functions implemented by ReLU-sine-$2^x$ networks are (generalized) differentiable, enabling us to apply SGD to train.
Published: 2021

22. A stochastic alternating minimization method for sparse phase retrieval

Author: Cai, Jianfeng, Jiao, Yuling, Lu, Xiliang, You, Juntao, Cai, Jianfeng, Jiao, Yuling, Lu, Xiliang, and You, Juntao
Abstract: Sparse phase retrieval plays an important role in many fields of applied science and thus attracts lots of attention. In this paper, we propose a stochastic alternating minimization method for sparse phase retrieval (StormSpar) algorithm which empirically is able to recover n-dimensional s-sparse signals from only O(s log n) measurements without a desired initial value required by many existing methods. In StormSpar, the hard-thresholding pursuit (HTP) algorithm is employed to solve the sparse constrained least-square sub-problems. The main competitive feature of StormSpar is that it converges globally requiring optimal order of number of samples with random initialization. Extensive numerical experiments are given to validate the proposed algorithm. 近年來稀疏相位恢復問題受到了越來越多的關注.本文提出了一種隨機交替方法方法求解稀疏相位恢復問題,該算法采用硬閾值追蹤算法求解帶稀疏約束的最小二乘子問題.大量的數值實驗表明,該算法可以通過O（s log n）次測量（理論上最少測量值）穩定的恢復n維s稀疏向量,并且在隨機初值下可以獲得全局收斂性.
Published: 2020

23. Robust Decoding from Binary Measurements with Cardinality Constraint Least Squares

Author: Ding, Zhao, Huang, Junjun, Jiao, Yuling, Lu, Xiliang, Yang, Zhijian, Ding, Zhao, Huang, Junjun, Jiao, Yuling, Lu, Xiliang, and Yang, Zhijian
Abstract: The main goal of 1-bit compressive sampling is to decode $n$ dimensional signals with sparsity level $s$ from $m$ binary measurements. This is a challenging task due to the presence of nonlinearity, noises and sign flips. In this paper, the cardinality constraint least square is proposed as a desired decoder. We prove that, up to a constant $c$, with high probability, the proposed decoder achieves a minimax estimation error as long as $m \geq \mathcal{O}( s\log n)$. Computationally, we utilize a generalized Newton algorithm (GNA) to solve the cardinality constraint minimization problem with the cost of solving a least squares problem with small size at each iteration. We prove that, with high probability, the $\ell_{\infty}$ norm of the estimation error between the output of GNA and the underlying target decays to $\mathcal{O}(\sqrt{\frac{\log n }{m}}) $ after at most $\mathcal{O}(\log s)$ iterations. Moreover, the underlying support can be recovered with high probability in $\mathcal{O}(\log s)$ steps provided that the target signal is detectable. Extensive numerical simulations and comparisons with state-of-the-art methods are presented to illustrate the robustness of our proposed decoder and the efficiency of the GNA algorithm., Comment: arXiv admin note: text overlap with arXiv:1711.01206
Published: 2020

24. Sparse Signal Recovery from Phaseless Measurements via Hard Thresholding Pursuit

Author: Cai, Jian-Feng, Li, Jingzhi, Lu, Xiliang, You, Juntao, Cai, Jian-Feng, Li, Jingzhi, Lu, Xiliang, and You, Juntao
Abstract: In this paper, we consider the sparse phase retrieval problem, recovering an $s$-sparse signal $\bm{x}^{\natural}\in\mathbb{R}^n$ from $m$ phaseless samples $y_i=|\langle\bm{x}^{\natural},\bm{a}_i\rangle|$ for $i=1,\ldots,m$. Existing sparse phase retrieval algorithms are usually first-order and hence converge at most linearly. Inspired by the hard thresholding pursuit (HTP) algorithm in compressed sensing, we propose an efficient second-order algorithm for sparse phase retrieval. Our proposed algorithm is theoretically guaranteed to give an exact sparse signal recovery in finite (in particular, at most $O(\log m + \log(\|\bm{x}^{\natural}\|_2/|x_{\min}^{\natural}|))$) steps, when $\{\bm{a}_i\}_{i=1}^{m}$ are i.i.d. standard Gaussian random vector with $m\sim O(s\log(n/s))$ and the initialization is in a neighborhood of the underlying sparse signal. Together with a spectral initialization, our algorithm is guaranteed to have an exact recovery from $O(s^2\log n)$ samples. Since the computational cost per iteration of our proposed algorithm is the same order as popular first-order algorithms, our algorithm is extremely efficient. Experimental results show that our algorithm can be several times faster than existing sparse phase retrieval algorithms.
Published: 2020
Full Text: View/download PDF

25. On Newton Screening

Author: Huang, Jian, Jiao, Yuling, Kang, Lican, Liu, Jin, Liu, Yanyan, Lu, Xiliang, Yang, Yuanyuan, Huang, Jian, Jiao, Yuling, Kang, Lican, Liu, Jin, Liu, Yanyan, Lu, Xiliang, and Yang, Yuanyuan
Abstract: Screening and working set techniques are important approaches to reducing the size of an optimization problem. They have been widely used in accelerating first-order methods for solving large-scale sparse learning problems. In this paper, we develop a new screening method called Newton screening (NS) which is a generalized Newton method with a built-in screening mechanism. We derive an equivalent KKT system for the Lasso and utilize a generalized Newton method to solve the KKT equations. Based on this KKT system, a built-in working set with a relatively small size is first determined using the sum of primal and dual variables generated from the previous iteration, then the primal variable is updated by solving a least-squares problem on the working set and the dual variable updated based on a closed-form expression. Moreover, we consider a sequential version of Newton screening (SNS) with a warm-start strategy. We show that NS possesses an optimal convergence property in the sense that it achieves one-step local convergence. Under certain regularity conditions on the feature matrix, we show that SNS hits a solution with the same signs as the underlying true target and achieves a sharp estimation error bound with high probability. Simulation studies and real data analysis support our theoretical results and demonstrate that SNS is faster and more accurate than several state-of-the-art methods in our comparative studies., Comment: This paper has fatal mistakes
Published: 2020

26. A Support Detection and Root Finding Approach for Learning High-dimensional Generalized Linear Models

Author: Huang, Jian, Jiao, Yuling, Kang, Lican, Liu, Jin, Liu, Yanyan, Lu, Xiliang, Huang, Jian, Jiao, Yuling, Kang, Lican, Liu, Jin, Liu, Yanyan, and Lu, Xiliang
Abstract: Feature selection is important for modeling high-dimensional data, where the number of variables can be much larger than the sample size. In this paper, we develop a support detection and root finding procedure to learn the high dimensional sparse generalized linear models and denote this method by GSDAR. Based on the KKT condition for $\ell_0$-penalized maximum likelihood estimations, GSDAR generates a sequence of estimators iteratively. Under some restricted invertibility conditions on the maximum likelihood function and sparsity assumption on the target coefficients, the errors of the proposed estimate decays exponentially to the optimal order. Moreover, the oracle estimator can be recovered if the target signal is stronger than the detectable level. We conduct simulations and real data analysis to illustrate the advantages of our proposed method over several existing methods, including Lasso and MCP.
Published: 2020

27. Generative Learning With Euler Particle Transport

Author: Gao, Yuan, Huang, Jian, Jiao, Yuling, Liu, Jin, Lu, Xiliang, Yang, Zhijian, Gao, Yuan, Huang, Jian, Jiao, Yuling, Liu, Jin, Lu, Xiliang, and Yang, Zhijian
Abstract: We propose an Euler particle transport (EPT) approach for generative learning. The proposed approach is motivated by the problem of finding an optimal transport map from a reference distribution to a target distribution characterized by the Monge-Ampere equation. Interpreting the infinitesimal linearization of the Monge-Ampere equation from the perspective of gradient flows in measure spaces leads to a stochastic McKean-Vlasov equation. We use the forward Euler method to solve this equation. The resulting forward Euler map pushes forward a reference distribution to the target. This map is the composition of a sequence of simple residual maps, which are computationally stable and easy to train. The key task in training is the estimation of the density ratios or differences that determine the residual maps. We estimate the density ratios (differences) based on the Bregman divergence with a gradient penalty using deep density-ratio (difference) fitting. We show that the proposed density-ratio (difference) estimators do not suffer from the "curse of dimensionality" if data is supported on a lower-dimensional manifold. Numerical experiments with multi-mode synthetic datasets and comparisons with the existing methods on real benchmark datasets support our theoretical results and demonstrate the effectiveness of the proposed method., Comment: arXiv admin note: substantial text overlap with arXiv:2002.02862
Published: 2020

28. A stochastic alternating minimizing method for sparse phase retrieval

Author: Cai, Jianfeng, Jiao, Yuling, Lu, Xiliang, You, Juntao, Cai, Jianfeng, Jiao, Yuling, Lu, Xiliang, and You, Juntao
Abstract: Sparse phase retrieval plays an important role in many fields of applied science and thus attracts lots of attention. In this paper, we propose a \underline{sto}chastic alte\underline{r}nating \underline{m}inimizing method for \underline{sp}arse ph\underline{a}se \underline{r}etrieval (\textit{StormSpar}) algorithm which {emprically} is able to recover $n$-dimensional $s$-sparse signals from only $O(s\,\mathrm{log}\, n)$ number of measurements without a desired initial value required by many existing methods. In \textit{StormSpar}, the hard-thresholding pursuit (HTP) algorithm is employed to solve the sparse constraint least square sub-problems. The main competitive feature of \textit{StormSpar} is that it converges globally requiring optimal order of number of samples with random initialization. Extensive numerical experiments are given to validate the proposed algorithm.
Published: 2019

29. SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties

Author: Huang, Jian, Jiao, Yuling, Lu, Xiliang, Shi, Yueyong, Yang, Qinglong, Huang, Jian, Jiao, Yuling, Lu, Xiliang, Shi, Yueyong, and Yang, Qinglong
Abstract: We propose a semismooth Newton algorithm for pathwise optimization (SNAP) for the LASSO and Enet in sparse, high-dimensional linear regression. SNAP is derived from a suitable formulation of the KKT conditions based on Newton derivatives. It solves the semismooth KKT equations efficiently by actively and continuously seeking the support of the regression coefficients along the solution path with warm start. At each knot in the path, SNAP converges locally superlinearly for the Enet criterion and achieves an optimal local convergence rate for the LASSO criterion, i.e., SNAP converges in one step at the cost of two matrix-vector multiplication per iteration. Under certain regularity conditions on the design matrix and the minimum magnitude of the nonzero elements of the target regression coefficients, we show that SNAP hits a solution with the same signs as the regression coefficients and achieves a sharp estimation error bound in finite steps with high probability. The computational complexity of SNAP is shown to be the same as that of LARS and coordinate descent algorithms per iteration. Simulation studies and real data analysis support our theoretical results and demonstrate that SNAP is faster and accurate than LARS and coordinate descent algorithms.
Published: 2018

30. Oblique Stripe Removal in Remote Sensing Images via Oriented Variation

Author: Liu, Xinxin, Lu, Xiliang, Shen, Huanfeng, Yuan, Qiangqiang, Zhang, Liangpei, Liu, Xinxin, Lu, Xiliang, Shen, Huanfeng, Yuan, Qiangqiang, and Zhang, Liangpei
Abstract: Destriping is a classical problem in remote sensing image processing. Although considerable effort has been made to remove stripes, few of the existing methods can eliminate stripe noise with arbitrary orientations. This situation makes the removal of oblique stripes in the higher-level remote sensing products become an unfinished and urgent issue. To overcome the challenging problem, we propose a novel destriping model which is self-adjusted to different orientations of stripe noise. First of all, the oriented variation model is designed to accomplish the stripe orientation approximation. In this model, the stripe direction is automatically estimated and then imbedded into the constraint term to depict the along-stripe smoothness of the stripe component. Mainly based on the oriented variation model, a whole destriping framework is proposed by jointly employing an L1-norm constraint and a TV regularization to separately capture the global distribution property of stripe component and the piecewise smoothness of the clean image. The qualitative and quantitative experimental results of both orientation and destriping aspects confirm the effectiveness and stability of the proposed method., Comment: 14 pages, 14 figures
Published: 2018

31. Preconditioned alternating direction method of multipliers for inverse problems with constraints

Author: Jiao, Yuling, Jin, Qinian, Lu, Xiliang, Wang, Weijie, Jiao, Yuling, Jin, Qinian, Lu, Xiliang, and Wang, Weijie
Abstract: We propose a preconditioned alternating direction method of multipliers (ADMM) to solve linear inverse problems in Hilbert spaces with constraints, where the feature of the sought solution under a linear transformation is captured by a possibly non-smooth convex function. During each iteration step, our method avoids solving large linear systems by choosing a suitable preconditioning operator. In case the data is given exactly, we prove the convergence of our preconditioned ADMM without assuming the existence of a Lagrange multiplier. In case the data is corrupted by noise, we propose a stopping rule using information on noise level and show that our preconditioned ADMM is a regularization method; we also propose a heuristic rule when the information on noise level is unavailable or unreliable and give its detailed analysis. Numerical examples are presented to test the performance of the proposed method.
Published: 2017

32. Group Sparse Recovery via the $\ell^0(\ell^2)$ Penalty: Theory and Algorithm

Author: Jiao, Yuling, Jin, Bangti, Lu, Xiliang, Jiao, Yuling, Jin, Bangti, and Lu, Xiliang
Abstract: In this work we propose and analyze a novel approach for group sparse recovery. It is based on regularized least squares with an $\ell^0(\ell^2)$ penalty, which penalizes the number of nonzero groups. One distinct feature of the approach is that it has the built-in decorrelation mechanism within each group, and thus can handle challenging strong inner-group correlation. We provide a complete analysis of the regularized model, e.g., existence of a global minimizer, invariance property, support recovery, and properties of block coordinatewise minimizers. Further, the regularized problem admits an efficient primal dual active set algorithm with a provable finite-step global convergence. At each iteration, it involves solving a least-squares problem on the active set only, and exhibits a fast local convergence, which makes the method extremely efficient for recovering group sparse signals. Extensive numerical experiments are presented to illustrate salient features of the model and the efficiency and accuracy of the algorithm. A comparative study indicates its competitiveness with existing approaches., Comment: 15 pp, to appear at IEEE Transactions on Signal Processing
Published: 2016
Full Text: View/download PDF

33. A fast nonstationary iterative method with convex penalty for inverse problems in Hilbert spaces

Author: Jin, Qinian, Lu, Xiliang, Jin, Qinian, and Lu, Xiliang
Abstract: In this paper we consider the computation of approximate solutions for inverse problems in Hilbert spaces. In order to capture the special feature of solutions, non-smooth convex functions are introduced as penalty terms. By exploiting the Hilbert space structure of the underlying problems, we propose a fast iterative regularization method which reduces to the classical nonstationary iterated Tikhonov regularization when the penalty term is chosen to be the square of norm. Each iteration of the method consists of two steps: the first step involves only the operator from the problem while the second step involves only the penalty term. This splitting character has the advantage of making the computation efficient. In case the data is corrupted by noise, a stopping rule is proposed to terminate the method and the corresponding regularization property is established. Finally, we test the performance of the method by reporting various numerical simulations, including the image deblurring, the determination of source term in Poisson equation, and the de-autoconvolution problem.
Published: 2014

34. A Primal Dual Active Set with Continuation Algorithm for the \ell^0-Regularized Optimization Problem

Author: Jiao, Yuling, Jin, Bangti, Lu, Xiliang, Jiao, Yuling, Jin, Bangti, and Lu, Xiliang
Abstract: We develop a primal dual active set with continuation algorithm for solving the \ell^0-regularized least-squares problem that frequently arises in compressed sensing. The algorithm couples the the primal dual active set method with a continuation strategy on the regularization parameter. At each inner iteration, it first identifies the active set from both primal and dual variables, and then updates the primal variable by solving a (typically small) least-squares problem defined on the active set, from which the dual variable can be updated explicitly. Under certain conditions on the sensing matrix, i.e., mutual incoherence property or restricted isometry property, and the noise level, the finite step global convergence of the algorithm is established. Extensive numerical examples are presented to illustrate the efficiency and accuracy of the algorithm and the convergence analysis.
Published: 2014

35. IMA Journal of Numerical Analysis / Optimal control for an elliptic system with convex polygonal control constraints

Author: Kunisch, Karl, Kunisch, Karl, Lu, Xiliang, Kunisch, Karl, Kunisch, Karl, and Lu, Xiliang
Abstract: The semi-smooth Newton method for optimal control problemsfor systems of partial differential equations with polygonal constraints on thecontrols is developed. The Newton derivatives are characterized for the caseof systems of dimension two, superlinear convergence is verified, and a simpleproof-of-concept numerical example is provided.
Published: 2013
Full Text: View/download PDF

36. Mathematical Programming / Optimal control for elliptic systems with pointwise euclidean norm constraints on the controls

Author: Kunisch, Karl, Kunisch, Karl, Lu, Xiliang, Kunisch, Karl, Kunisch, Karl, and Lu, Xiliang
Abstract: Optimal control for an elliptic system with pointwise Euclidean norm constraints on the control variables is investigated. First order optimality conditions are derived in a manner that is amenable for numerical realisation. An efficient semismooth Newton algorithm is proposed based on this optimality system. Numerical examples are given to validate the superlinear convergence of the semismooth Newton algorithm., Xiliang Lu, Karl Kunisch
Published: 2013
Full Text: View/download PDF

37. IMA Journal of Numerical Analysis / Optimal control for an elliptic system with convex polygonal control constraints

Author: Kunisch, Karl, Kunisch, Karl, Lu, Xiliang, Kunisch, Karl, Kunisch, Karl, and Lu, Xiliang
Abstract: The semi-smooth Newton method for optimal control problemsfor systems of partial differential equations with polygonal constraints on thecontrols is developed. The Newton derivatives are characterized for the caseof systems of dimension two, superlinear convergence is verified, and a simpleproof-of-concept numerical example is provided.
Published: 2013
Full Text: View/download PDF

38. Mathematical Programming / Optimal control for elliptic systems with pointwise euclidean norm constraints on the controls

Author: Kunisch, Karl, Kunisch, Karl, Lu, Xiliang, Kunisch, Karl, Kunisch, Karl, and Lu, Xiliang
Abstract: Optimal control for an elliptic system with pointwise Euclidean norm constraints on the control variables is investigated. First order optimality conditions are derived in a manner that is amenable for numerical realisation. An efficient semismooth Newton algorithm is proposed based on this optimality system. Numerical examples are given to validate the superlinear convergence of the semismooth Newton algorithm., Xiliang Lu, Karl Kunisch
Published: 2013
Full Text: View/download PDF

39. A Primal Dual Active Set Algorithm with Continuation for Compressed Sensing

Author: Fan, Qibin, Jiao, Yuling, Lu, Xiliang, Fan, Qibin, Jiao, Yuling, and Lu, Xiliang
Abstract: The success of compressed sensing relies essentially on the ability to efficiently find an approximately sparse solution to an under-determined linear system. In this paper, we developed an efficient algorithm for the sparsity promoting $\ell_1$-regularized least squares problem by coupling the primal dual active set strategy with a continuation technique (on the regularization parameter). In the active set strategy, we first determine the active set from primal and dual variables, and then update the primal and dual variables by solving a low-dimensional least square problem on the active set, which makes the algorithm very efficient. The continuation technique globalizes the convergence of the algorithm, with provable global convergence under restricted isometry property (RIP). Further, we adopt two alternative methods, i.e., a modified discrepancy principle and a Bayesian information criterion, to choose the regularization parameter. Numerical experiments indicate that our algorithm is very competitive with state-of-the-art algorithms in terms of accuracy and efficiency.
Published: 2013
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

39 results on '"Lu, Xiliang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources