State of LLM Inference Optimization

4 papers · avg viability 5.3

Download CSV View topic page

LlamaHugging FacePyTorch

Top papers

LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding(7.0)
Beyond Speedup -- Utilizing KV Cache for Sampling and Reasoning(6.0)
More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)(5.0)
TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference(3.0)