State of Efficient Inference

3 papers · avg viability 6.7

View topic page

Top papers

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention(7.0)
ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference(7.0)
ESTAR: Early-Stopping Token-Aware Reasoning For Efficient Inference(6.0)