Efficient Transformers

Trending

3papers

5.3viability

+100%30d

Papers

1–3 of 3

Research Paper·Feb 11, 2026

Retrieval-Aware Distillation for Transformer-SSM Hybrids

State-space models (SSMs) offer efficient sequence modeling but lag behind Transformers on benchmarks that require in-context retrieval. Prior work links this gap to a small set of attention heads, te...

6.0 viability

Research Paper·Feb 5, 2026·B2B

ZeroS: Zero-Sum Linear Attention for Efficient Transformers

Linear attention methods offer Transformers $O(N)$ complexity but typically underperform standard softmax attention. We identify two fundamental limitations affecting these approaches: the restriction...

5.0 viability