Papers
1–3 of 3Research Paper·Jan 29, 2026
Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
Hybrid Transformer architectures, which combine softmax attention blocks and recurrent neural networks (RNNs), have shown a desirable performance-throughput tradeoff for long-context modeling, but the...
5.0 viability
Research Paper·Feb 17, 2026
The Information Geometry of Softmax: Probing and Steering
This paper concerns the question of how AI systems encode semantic structure into the geometric structure of their representation spaces. The motivating observation of this paper is that the natural g...
5.0 viability
Research Paper·Feb 2, 2026
Poly-attention: a general scheme for higher-order self-attention
The self-attention mechanism, at the heart of the Transformer model, is able to effectively model pairwise interactions between tokens. However, numerous recent works have shown that it is unable to p...
2.0 viability