Efficient Inference Comparison Hub

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention(7.0)

DyLLM accelerates masked diffusion language model inference by selectively processing salient tokens, achieving significant throughput gains with minimal accuracy loss.

ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference(7.0)

ASAP is a novel pruning method that enhances the efficiency of Large Vision-Language Models by addressing attention shifts and reducing token redundancy.

ESTAR: Early-Stopping Token-Aware Reasoning For Efficient Inference(6.0)

Develop an AI tool to optimize large reasoning models by implementing early stopping mechanisms, significantly reducing computation without sacrificing accuracy.

Efficient Inference Comparison Hub

Reference Surfaces

Top Papers