vLLM
vLLM is a library in our research taxonomy.
Related papers
- P-EAGLE: Parallel-Drafting EAGLE with Scalable Training
- AviationLMM: A Large Multimodal Foundation Model for Civil Aviation
- Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs
- EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference
- Qrita: High-performance Top-k and Top-p Algorithm for GPUs using Pivot-based Truncation and Selection