Efficient Inference

Trending

3papers

6.7viability

+100%30d

Papers

1–3 of 3

Research Paper·Mar 9, 2026

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention

Masked Diffusion Language Models (MDLMs) enable parallel token decoding, providing a promising alternative to the sequential nature of autoregressive generation. However, their iterative denoising pro...

7.0 viability

Research Paper·Mar 15, 2026

ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference

While Large Vision-Language Models (LVLMs) demonstrate exceptional multi-modal capabilities, the quadratic computational cost of processing high-resolution visual tokens remains a critical bottleneck....

7.0 viability

Research Paper·Feb 10, 2026

ESTAR: Early-Stopping Token-Aware Reasoning For Efficient Inference

Large reasoning models (LRMs) achieve state-of-the-art performance by generating long chains-of-thought, but often waste computation on redundant reasoning after the correct answer has already been re...

6.0 viability

Efficient Inference

Papers

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention

ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference

ESTAR: Early-Stopping Token-Aware Reasoning For Efficient Inference

Filters