Papers
1–3 of 3Research Paper·Mar 9, 2026
DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention
Masked Diffusion Language Models (MDLMs) enable parallel token decoding, providing a promising alternative to the sequential nature of autoregressive generation. However, their iterative denoising pro...
7.0 viability
Research Paper·Mar 15, 2026
ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference
While Large Vision-Language Models (LVLMs) demonstrate exceptional multi-modal capabilities, the quadratic computational cost of processing high-resolution visual tokens remains a critical bottleneck....
7.0 viability
Research Paper·Feb 10, 2026
ESTAR: Early-Stopping Token-Aware Reasoning For Efficient Inference
Large reasoning models (LRMs) achieve state-of-the-art performance by generating long chains-of-thought, but often waste computation on redundant reasoning after the correct answer has already been re...
6.0 viability