Search

Your search keyword '"Wang, Zhangyang"' showing total 286 results

Search Constraints

Start Over You searched for: Author "Wang, Zhangyang" Remove constraint Author: "Wang, Zhangyang" Topic computer science - machine learning Remove constraint Topic: computer science - machine learning
286 results on '"Wang, Zhangyang"'

Search Results

1. Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

2. Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

3. AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

4. Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis

5. Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

6. On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability

7. All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks

8. From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

9. Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

10. Flextron: Many-in-One Flexible Large Language Model

11. LoCoCo: Dropping In Convolutions for Long Context Compression

12. Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study

13. StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

14. Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk

15. GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

16. Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

17. Principled Architecture-aware Scaling of Hyperparameters

18. Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization

19. Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

20. Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

21. LLaGA: Large Language and Graph Assistant

22. QuantumSEA: In-Time Sparse Exploration for Noise Adaptive Quantum Circuits

23. Taming Mode Collapse in Score Distillation for Text-to-3D Generation

24. Meta ControlNet: Enhancing Task Adaptation via Meta Learning

25. Rethinking PGD Attack: Is Sign Function Necessary?

26. Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality

27. Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

28. Compressing LLMs: The Truth is Rarely Pure and Never Simple

29. Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs 'Difficult' Downstream Tasks in LLMs

30. Robust Mixture-of-Expert Training for Convolutional Neural Networks

31. INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing

32. Doubly Robust Instance-Reweighted Adversarial Training

33. Polynomial Width is Sufficient for Set Representation with High-dimensional Features

34. Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities

35. H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

36. Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication

37. Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models

38. The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

39. Dynamic Sparsity Is Channel-Level Sparsity Learner

40. Towards Constituting Mathematical Structures for Learning to Optimize

41. Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation

42. Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

43. AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection

44. Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling

45. Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

46. PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

47. Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

48. Learning to Grow Pretrained Models for Efficient Transformer Training

49. Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

50. M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast Self-Adaptation

Catalog

Books, media, physical & digital resources