AI Models

3papers

4.0viability

-50%30d

Papers

1–3 of 3

Research Paper·Jan 29, 2026

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Hybrid Transformer architectures, which combine softmax attention blocks and recurrent neural networks (RNNs), have shown a desirable performance-throughput tradeoff for long-context modeling, but the...

5.0 viability

Research Paper·Feb 17, 2026

The Information Geometry of Softmax: Probing and Steering

This paper concerns the question of how AI systems encode semantic structure into the geometric structure of their representation spaces. The motivating observation of this paper is that the natural g...

5.0 viability

Research Paper·Feb 2, 2026

Poly-attention: a general scheme for higher-order self-attention

The self-attention mechanism, at the heart of the Transformer model, is able to effectively model pairwise interactions between tokens. However, numerous recent works have shown that it is unable to p...

2.0 viability

AI Models

Papers

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

The Information Geometry of Softmax: Probing and Steering

Poly-attention: a general scheme for higher-order self-attention

Filters