Efficient Inference Comparison Hub
3 papers - avg viability 6.7
Top Papers
- DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention(7.0)
DyLLM accelerates masked diffusion language model inference by selectively processing salient tokens, achieving significant throughput gains with minimal accuracy loss.
- ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference(7.0)
ASAP is a novel pruning method that enhances the efficiency of Large Vision-Language Models by addressing attention shifts and reducing token redundancy.
- ESTAR: Early-Stopping Token-Aware Reasoning For Efficient Inference(6.0)
Develop an AI tool to optimize large reasoning models by implementing early stopping mechanisms, significantly reducing computation without sacrificing accuracy.