Search

Your search keyword '"Stoica, Ion"' showing total 968 results

Search Constraints

Start Over You searched for: Author "Stoica, Ion" Remove constraint Author: "Stoica, Ion"
968 results on '"Stoica, Ion"'

Search Results

1. Locality-aware Fair Scheduling in LLM Serving

2. The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution

3. Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards

4. Revisiting Cache Freshness for Emerging Real-Time Applications

5. A Statistical Framework for Ranking LLM-Based Chatbots

6. HashAttention: Semantic Sparsity for Faster Inference

7. VisionArena: 230K Real World User-VLM Conversations with Preference Labels

8. GameArena: Evaluating LLM Reasoning through Live Computer Games

9. FogROS2-FT: Fault Tolerant Cloud Robotics

10. Specifications: The missing link to making the development of LLM systems an engineering discipline

11. BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

12. MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs

13. Pie: Pooling CPU Memory for LLM Inference

14. SkyServe: Serving AI Models across Regions and Clouds with Spot Instances

15. NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

16. Managing Bandwidth: The Key to Cloud-Assisted Autonomous Driving

17. How to Evaluate Reward Models for RLHF

18. JudgeBench: A Benchmark for Evaluating LLM-based Judges

19. Efficient LLM Scheduling by Learning to Rank

20. Post-Training Sparse Attention with Double Sparsity

21. MPC-Minimized Secure LLM Inference

22. Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design

23. RouteLLM: Learning to Route LLMs with Preference Data

24. Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

25. From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

26. OR-Bench: An Over-Refusal Benchmark for Large Language Models

27. Crafting Interpretable Embeddings by Asking LLMs Questions

28. Stylus: Automatic Adapter Selection for Diffusion Models

29. M\'elange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

30. GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications

31. Trustless Audits without Revealing Data or Models

32. MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving

33. RAFT: Adapting Language Model to Domain Specific RAG

34. depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

35. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

36. Optimizing LLM Queries in Relational Workloads

37. Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

38. Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

39. Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

40. Fairness in Serving Large Language Models

41. SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads

42. CodeScholar: Growing Idiomatic Code Examples

43. SGLang: Efficient Execution of Structured Language Model Programs

44. Rethinking Benchmark and Contamination for Language Models with Rephrased Samples

45. S-LoRA: Serving Thousands of Concurrent LoRA Adapters

46. MemGPT: Towards LLMs as Operating Systems

47. Online Speculative Decoding

48. DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

49. LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

50. Efficient Memory Management for Large Language Model Serving with PagedAttention

Catalog

Books, media, physical & digital resources