15 results on '"Tan, Chengli"'
Search Results
2. Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy
- Author
-
Tan, Chengli, Zhang, Jiangshe, Liu, Junmin, Wang, Yicheng, and Hao, Yunda
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Recently, sharpness-aware minimization (SAM) has attracted much attention because of its surprising effectiveness in improving generalization performance. However, compared to stochastic gradient descent (SGD), it is more prone to getting stuck at the saddle points, which as a result may lead to performance degradation. To address this issue, we propose a simple renormalization strategy, dubbed Stable SAM (SSAM), so that the gradient norm of the descent step maintains the same as that of the ascent step. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost. With elementary tools from convex optimization and learning theory, we also conduct a theoretical analysis of sharpness-aware training, revealing that compared to SGD, the effectiveness of SAM is only assured in a limited regime of learning rate. In contrast, we show how SSAM extends this regime of learning rate and then it can consistently perform better than SAM with the minor modification. Finally, we demonstrate the improved performance of SSAM on several representative data sets and tasks., Comment: 33 pages
- Published
- 2024
3. Brain-inspired dual-pathway neural network architecture and its generalization analysis
- Author
-
Dong, SongLin, Tan, ChengLi, Zuo, ZhenTao, He, YuHang, Gong, YiHong, Zhou, TianGang, Liu, JunMin, and Zhang, JiangShe
- Published
- 2024
- Full Text
- View/download PDF
4. Seismic Data Interpolation via Denoising Diffusion Implicit Models with Coherence-corrected Resampling
- Author
-
Wei, Xiaoli, Zhang, Chunxia, Wang, Hongtao, Tan, Chengli, Xiong, Deng, Jiang, Baisong, Zhang, Jiangshe, and Kim, Sang-Woon
- Subjects
Physics - Geophysics ,Statistics - Machine Learning - Abstract
Accurate interpolation of seismic data is crucial for improving the quality of imaging and interpretation. In recent years, deep learning models such as U-Net and generative adversarial networks have been widely applied to seismic data interpolation. However, they often underperform when the training and test missing patterns do not match. To alleviate this issue, here we propose a novel framework that is built upon the multi-modal adaptable diffusion models. In the training phase, following the common wisdom, we use the denoising diffusion probabilistic model with a cosine noise schedule. This cosine global noise configuration improves the use of seismic data by reducing the involvement of excessive noise stages. In the inference phase, we introduce the denoising diffusion implicit model to reduce the number of sampling steps. Different from the conventional unconditional generation, we incorporate the known trace information into each reverse sampling step for achieving conditional interpolation. To enhance the coherence and continuity between the revealed traces and the missing traces, we further propose two strategies, including successive coherence correction and resampling. Coherence correction penalizes the mismatches in the revealed traces, while resampling conducts cyclic interpolation between adjacent reverse steps. Extensive experiments on synthetic and field seismic data validate our model's superiority and demonstrate its generalization capability to various missing patterns and different noise levels with just one training session. In addition, uncertainty quantification and ablation studies are also investigated., Comment: 14 pages, 13 figures
- Published
- 2023
5. Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution
- Author
-
Zhao, Zixiang, Zhang, Jiangshe, Gu, Xiang, Tan, Chengli, Xu, Shuang, Zhang, Yulun, Timofte, Radu, and Van Gool, Luc
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Guided depth map super-resolution (GDSR), as a hot topic in multi-modal image processing, aims to upsample low-resolution (LR) depth maps with additional information involved in high-resolution (HR) RGB images from the same scene. The critical step of this task is to effectively extract domain-shared and domain-private RGB/depth features. In addition, three detailed issues, namely blurry edges, noisy surfaces, and over-transferred RGB texture, need to be addressed. In this paper, we propose the Spherical Space feature Decomposition Network (SSDNet) to solve the above issues. To better model cross-modality features, Restormer block-based RGB/depth encoders are employed for extracting local-global features. Then, the extracted features are mapped to the spherical space to complete the separation of private features and the alignment of shared features. Shared features of RGB are fused with the depth features to complete the GDSR task. Subsequently, a spherical contrast refinement (SCR) module is proposed to further address the detail issues. Patches that are classified according to imperfect categories are input into the SCR module, where the patch features are pulled closer to the ground truth and pushed away from the corresponding imperfect samples in the spherical feature space via contrastive learning. Extensive experiments demonstrate that our method can achieve state-of-the-art results on four test datasets, as well as successfully generalize to real-world scenes. The code is available at \url{https://github.com/Zhaozixiang1228/GDSR-SSDNet}., Comment: Accepted by ICCV 2023
- Published
- 2023
6. Learning Non-Vacuous Generalization Bounds from Optimization
- Author
-
Tan, Chengli, Zhang, Jiangshe, and Liu, Junmin
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
One of the fundamental challenges in the deep learning community is to theoretically understand how well a deep neural network generalizes to unseen data. However, current approaches often yield generalization bounds that are either too loose to be informative of the true generalization error or only valid to the compressed nets. In this study, we present a simple yet non-vacuous generalization bound from the optimization perspective. We achieve this goal by leveraging that the hypothesis set accessed by stochastic gradient algorithms is essentially fractal-like and thus can derive a tighter bound over the algorithm-dependent Rademacher complexity. The main argument rests on modeling the discrete-time recursion process via a continuous-time stochastic differential equation driven by fractional Brownian motion. Numerical studies demonstrate that our approach is able to yield plausible generalization guarantees for modern neural networks such as ResNet and Vision Transformer, even when they are trained on a large-scale dataset (e.g. ImageNet-1K)., Comment: 35pages
- Published
- 2022
7. Understanding Short-Range Memory Effects in Deep Neural Networks
- Author
-
Tan, Chengli, Zhang, Jiangshe, and Liu, Junmin
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Stochastic gradient descent (SGD) is of fundamental importance in deep learning. Despite its simplicity, elucidating its efficacy remains challenging. Conventionally, the success of SGD is ascribed to the stochastic gradient noise (SGN) incurred in the training process. Based on this consensus, SGD is frequently treated and analyzed as the Euler-Maruyama discretization of stochastic differential equations (SDEs) driven by either Brownian or Levy stable motion. In this study, we argue that SGN is neither Gaussian nor Levy stable. Instead, inspired by the short-range correlation emerging in the SGN series, we propose that SGD can be viewed as a discretization of an SDE driven by fractional Brownian motion (FBM). Accordingly, the different convergence behavior of SGD dynamics is well-grounded. Moreover, the first passage time of an SDE driven by FBM is approximately derived. The result suggests a lower escaping rate for a larger Hurst parameter, and thus SGD stays longer in flat minima. This happens to coincide with the well-known phenomenon that SGD favors flat minima that generalize well. Extensive experiments are conducted to validate our conjecture, and it is demonstrated that short-range memory effects persist across various model architectures, datasets, and training strategies. Our study opens up a new perspective and may contribute to a better understanding of SGD., Comment: 15pages
- Published
- 2021
- Full Text
- View/download PDF
8. Robust Teacher: Self-correcting pseudo-label-guided semi-supervised learning for object detection
- Author
-
Li, Shijie, Liu, Junmin, Shen, Weilin, Sun, Jianyong, and Tan, Chengli
- Published
- 2023
- Full Text
- View/download PDF
9. Sharpness-Aware Lookahead for Accelerating Convergence and Improving Generalization
- Author
-
Tan, Chengli, Zhang, Jiangshe, Liu, Junmin, and Gong, Yihong
- Abstract
Lookahead is a popular stochastic optimizer that can accelerate the training process of deep neural networks. However, the solutions found by Lookahead often generalize worse than those found by its base optimizers, such as SGD and Adam. To address this issue, we propose Sharpness-Aware Lookahead (SALA), a novel optimizer that aims to identify flat minima that generalize well. SALA divides the training process into two stages. In the first stage, the direction towards flat regions is determined by leveraging a quadratic approximation of the optimization trajectory, without incurring any extra computational overhead. In the second stage, however, it is determined by Sharpness-Aware Minimization (SAM), which is particularly effective in improving generalization at the terminal phase of training. In contrast to Lookahead, SALA retains the benefits of accelerated convergence while also enjoying superior generalization performance compared to the base optimizer. Theoretical analysis of the expected excess risk, as well as empirical results on canonical neural network architectures and datasets, demonstrate the advantages of SALA over Lookahead. It is noteworthy that with approximately 25% more computational overhead than the base optimizer, SALA can achieve the same generalization performance as SAM which requires twice the training budget of the base optimizer.
- Published
- 2024
- Full Text
- View/download PDF
10. Understanding Short-Range Memory Effects in Deep Neural Networks
- Author
-
Tan, Chengli, Zhang, Jiangshe, and Liu, Junmin
- Abstract
Stochastic gradient descent (SGD) is of fundamental importance in deep learning. Despite its simplicity, elucidating its efficacy remains challenging. Conventionally, the success of SGD is ascribed to the stochastic gradient noise (SGN) incurred in the training process. Based on this consensus, SGD is frequently treated and analyzed as the Euler–Maruyama discretization of stochastic differential equations (SDEs) driven by either Brownian or Lévy stable motion. In this study, we argue that SGN is neither Gaussian nor Lévy stable. Instead, inspired by the short-range correlation emerging in the SGN series, we propose that SGD can be viewed as a discretization of an SDE driven by fractional Brownian motion (FBM). Accordingly, the different convergence behavior of SGD dynamics is well-grounded. Moreover, the first passage time of an SDE driven by FBM is approximately derived. The result suggests a lower escaping rate for a larger Hurst parameter, and thus, SGD stays longer in flat minima. This happens to coincide with the well-known phenomenon that SGD favors flat minima that generalize well. Extensive experiments are conducted to validate our conjecture, and it is demonstrated that short-range memory effects persist across various model architectures, datasets, and training strategies. Our study opens up a new perspective and may contribute to a better understanding of SGD.
- Published
- 2024
- Full Text
- View/download PDF
11. Seismic Data Interpolation based on Denoising Diffusion Implicit Models with Resampling
- Author
-
Wei, Xiaoli, Zhang, Chunxia, Wang, Hongtao, Tan, Chengli, Xiong, Deng, Jiang, Baisong, Zhang, Jiangshe, and Kim, Sang-Woon
- Subjects
FOS: Computer and information sciences ,Physics - Geophysics ,Statistics - Machine Learning ,FOS: Physical sciences ,Machine Learning (stat.ML) ,Geophysics (physics.geo-ph) - Abstract
The incompleteness of the seismic data caused by missing traces along the spatial extension is a common issue in seismic acquisition due to the existence of obstacles and economic constraints, which severely impairs the imaging quality of subsurface geological structures. Recently, deep learningbased seismic interpolation methods have attained promising progress, while achieving stable training of generative adversarial networks is not easy, and performance degradation is usually notable if the missing patterns in the testing and training do not match. In this paper, we propose a novel seismic denoising diffusion implicit model with resampling. The model training is established on the denoising diffusion probabilistic model, where U-Net is equipped with the multi-head self-attention to match the noise in each step. The cosine noise schedule, serving as the global noise configuration, promotes the high utilization of known trace information by accelerating the passage of the excessive noise stages. The model inference utilizes the denoising diffusion implicit model, conditioning on the known traces, to enable high-quality interpolation with fewer diffusion steps. To enhance the coherency between the known traces and the missing traces within each reverse step, the inference process integrates a resampling strategy to achieve an information recap on the former interpolated traces. Extensive experiments conducted on synthetic and field seismic data validate the superiority of our model and its robustness to various missing patterns. In addition, uncertainty quantification and ablation studies are also investigated., 14 pages, 13 figures
- Published
- 2023
12. Seismic Data Interpolation via Denoising Diffusion Implicit Models With Coherence-Corrected Resampling
- Author
-
Wei, Xiaoli, Zhang, Chunxia, Wang, Hongtao, Tan, Chengli, Xiong, Deng, Jiang, Baisong, Zhang, Jiangshe, and Kim, Sang-Woon
- Abstract
Accurate interpolation of seismic data is crucial for improving the quality of imaging and interpretation. In recent years, deep learning models such as U-Net and generative adversarial networks (GANs) have been widely applied to seismic data interpolation. However, they often underperform when the training and test missing patterns do not match. To alleviate this issue, here we propose a novel framework that is built upon the multimodal adaptable diffusion models. In the training phase, following the common wisdom, we use the denoising diffusion probabilistic model with a cosine noise schedule. This cosine global noise configuration improves the use of seismic data by reducing the involvement of excessive noise stages. In the inference phase, we introduce the denoising diffusion implicit model (DDIM) to reduce the number of sampling steps. Different from the conventional unconditional generation, we incorporate the known trace information into each reverse sampling step for achieving conditional interpolation. To enhance the coherence and continuity between the revealed traces and the missing traces, we further propose two strategies, including successive coherence correction and resampling. Coherence correction penalizes the mismatches in the revealed traces, while resampling conducts cyclic interpolation between adjacent reverse steps. Extensive experiments on synthetic and field seismic data validate our model’s superiority and demonstrate its generalization capability to various missing patterns and different noise levels with just one training session. In addition, uncertainty quantification and ablation studies are also investigated.
- Published
- 2024
- Full Text
- View/download PDF
13. Robust Teacher: Self-Correcting Pseudo-Label-Guided Semi-Supervised Learning for Object Detection
- Author
-
Li, Shijie, primary, Liu, Junmin, additional, Shen, Weilin, additional, Sun, Jianyong, additional, and Tan, Chengli, additional
- Published
- 2023
- Full Text
- View/download PDF
14. Understanding Short-Range Memory Effects in Deep Neural Networks
- Author
-
Tan, Chengli, primary, Zhang, Jiangshe, additional, and Liu, Junmin, additional
- Published
- 2023
- Full Text
- View/download PDF
15. Trajectory-dependent Generalization Bounds for Deep Neural Networks via Fractional Brownian Motion
- Author
-
Tan, Chengli, Zhang, Jiangshe, and Liu, Junmin
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Machine Learning (cs.LG) - Abstract
Despite being tremendously overparameterized, it is appreciated that deep neural networks trained by stochastic gradient descent (SGD) generalize surprisingly well. Based on the Rademacher complexity of a pre-specified hypothesis set, different norm-based generalization bounds have been developed to explain this phenomenon. However, recent studies suggest these bounds might be problematic as they increase with the training set size, which is contrary to empirical evidence. In this study, we argue that the hypothesis set SGD explores is trajectory-dependent and thus may provide a tighter bound over its Rademacher complexity. To this end, we characterize the SGD recursion via a stochastic differential equation by assuming the incurred stochastic gradient noise follows the fractional Brownian motion. We then identify the Rademacher complexity in terms of the covering numbers and relate it to the Hausdorff dimension of the optimization trajectory. By invoking the hypothesis set stability, we derive a novel generalization bound for deep neural networks. Extensive experiments demonstrate that it predicts well the generalization gap over several common experimental interventions. We further show that the Hurst parameter of the fractional Brownian motion is more informative than existing generalization indicators such as the power-law index and the upper Blumenthal-Getoor index., 35pages, 15figures
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.