Search

Your search keyword '"Zhang, Minjia"' showing total 302 results

Search Constraints

Start Over You searched for: Author "Zhang, Minjia" Remove constraint Author: "Zhang, Minjia"
302 results on '"Zhang, Minjia"'

Search Results

1. MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache

2. Transforming the Hybrid Cloud for Emerging AI Workloads

3. Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment

4. Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions

5. Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks

6. UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

7. Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training

8. Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances

9. Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

10. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization

11. Configuration Validation with Large Language Models

12. DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

13. Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

14. DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

15. DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention

16. RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

17. Cost-effective On-device Continual Learning over Memory Hierarchy with Miro

18. DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

19. FedHC: A Scalable Federated Learning Framework for Heterogeneous and Resource-Constrained Clients

20. DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

21. Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

22. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

23. Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding

24. DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

25. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

26. Extreme Compression for Pre-trained Transformers Made Simple and Efficient

27. Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs

29. A Survey of Multi-Tenant Deep Learning Inference on GPU

30. Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

31. Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism

32. ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise

33. DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

34. A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities

35. NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

36. Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning

37. The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models

38. Understanding and Generalizing Monotonic Proximity Graphs for Approximate Nearest Neighbor Search

39. ZeRO-Offload: Democratizing Billion-Scale Model Training

41. Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

42. SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network

43. Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning

44. Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory

45. Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models

47. Learning Intrinsic Sparse Structures within Long Short-Term Memory

48. Vertical Scaling of Resource for OpenMP Application

Catalog

Books, media, physical & digital resources