Efficient Transformers Comparison Hub

Transform Transformer models into memory-efficient hybrids that maintain retrieval capabilities, using fewer attention heads.

ZeroS offers a more efficient Transformer attention mechanism for practical sequence modeling tasks.

Shiva-DiT offers efficient Diffusion Transformers through innovative top-$k$ selection reducing computation cost.

Reference Surfaces