LlamaHugging FacePyTorch
Top papers
- LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding(7.0)
- Beyond Speedup -- Utilizing KV Cache for Sampling and Reasoning(6.0)
- More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)(5.0)
- TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference(3.0)