Search

Your search keyword '"Sun, Ruoyu"' showing total 57 results

Search Constraints

Start Over You searched for: Author "Sun, Ruoyu" Remove constraint Author: "Sun, Ruoyu" Publication Type Reports Remove constraint Publication Type: Reports
57 results on '"Sun, Ruoyu"'

Search Results

1. Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity

2. MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning

3. Adam-mini: Use Fewer Learning Rates To Gain More

4. Bridging the Gap: Rademacher Complexity in Robust and Standard Generalization

5. PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

6. On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

7. Why Transformers Need Adam: A Hessian Perspective

8. Combining Transformer based Deep Reinforcement Learning with Black-Litterman Model for Portfolio Optimization

9. ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

10. LEMON: Lossless model expansion

11. PAC-Bayesian Spectrally-Normalized Bounds for Adversarially Robust Generalization

12. How Graph Neural Networks Learn: Lessons from Training Dynamics

13. AceGPT, Localizing Large Language Models in Arabic

14. Restricted Generative Projection for One-Class Classification and Anomaly Detection

15. NTK-SAP: Improving neural network pruning by aligning training dynamics

16. Balanced Training for Sparse GANs

17. Invariant Layers for Graphs with Nodes of Different Types

18. A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming

19. Adversarial Rademacher Complexity of Deep Neural Networks

20. DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data

21. When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

22. Stability Analysis and Generalization Bounds of Adversarial Training

23. Provable Adaptivity of Adam under Non-uniform Smoothness

24. Adam Can Converge Without Any Modification On Update Rules

25. Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning

26. Towards Understanding the Impact of Model Size on Differential Private Classification

27. Portfolio analysis with mean-CVaR and mean-CVaR-skewness criteria based on mean-variance mixture models

28. Federated Semi-Supervised Learning with Class Distribution Mismatch

29. Does Momentum Change the Implicit Regularization on Separable Data?

30. Achieving Small Test Error in Mildly Overparameterized Neural Networks

31. On a Faster $R$-Linear Convergence Rate of the Barzilai-Borwein Method

32. Towards a Better Global Loss Landscape of GANs

33. A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

34. On the Landscape of One-hidden-layer Sparse Networks and Beyond

35. The Global Landscape of Neural Networks: An Overview

36. Global Convergence and Generalization Bound of Gradient-Based Meta-Learning with Deep Neural Nets

37. Distilling Object Detectors with Task Adaptive Regularization

38. DEED: A General Quantization Scheme for Communication Efficiency in Bits

39. Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity

40. Optimization for deep learning: theory and algorithms

41. Sub-Optimal Local Minima Exist for Neural Networks with Almost All Non-Linear Activations

42. Understanding Limitation of Two Symmetrized Orders by Worst-case Complexity

43. Off-road Autonomous Vehicles Traversability Analysis and Trajectory Planning Based on Deep Inverse Reinforcement Learning

44. Max-Sliced Wasserstein Distance and its use for GANs

45. On the Benefit of Width for Neural Networks: Disappearance of Bad Basins

46. On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization

47. Adding One Neuron Can Eliminate All Bad Local Minima

48. Understanding the Loss Surface of Neural Networks for Binary Classification

49. Training Language Models Using Target-Propagation

50. Worst-case Complexity of Cyclic Coordinate Descent: $O(n^2)$ Gap with Randomized Version

Catalog

Books, media, physical & digital resources