Search

Your search keyword '"Xiao, Guangxuan"' showing total 19 results

Search Constraints

Start Over You searched for: Author "Xiao, Guangxuan" Remove constraint Author: "Xiao, Guangxuan"
19 results on '"Xiao, Guangxuan"'

Search Results

1. DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

2. Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

3. QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

4. Retrieval Head Mechanistically Explains Long-Context Factuality

6. BitDelta: Your Fine-Tune May Only Be Worth One Bit

7. InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

8. Efficient Streaming Language Models with Attention Sinks

9. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

10. FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

11. Sparse and Local Networks for Hypergraph Reasoning

12. Offsite-Tuning: Transfer Learning without Full Model

13. FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

14. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

16. Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks

18. ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training

Catalog

Books, media, physical & digital resources