Efficient Transformers Comparison Hub
3 papers - avg viability 5.3
Top Papers
- Retrieval-Aware Distillation for Transformer-SSM Hybrids(6.0)
Transform Transformer models into memory-efficient hybrids that maintain retrieval capabilities, using fewer attention heads.
- ZeroS: Zero-Sum Linear Attention for Efficient Transformers(5.0)
ZeroS offers a more efficient Transformer attention mechanism for practical sequence modeling tasks.
- Shiva-DiT: Residual-Based Differentiable Top-$k$ Selection for Efficient Diffusion Transformers(5.0)
Shiva-DiT offers efficient Diffusion Transformers through innovative top-$k$ selection reducing computation cost.