Papers
1–3 of 3Research Paper·Mar 17, 2026
SympFormer: Accelerated attention blocks via Inertial Dynamics on Density Manifolds
Transformers owe much of their empirical success in natural language processing to the self-attention blocks. Recent perspectives interpret attention blocks as interacting particle systems, whose mean...
5.0 viability
Research Paper·Feb 11, 2026
Gradients Must Earn Their Influence: Unifying SFT with Generalized Entropic Objectives
Standard negative log-likelihood (NLL) for Supervised Fine-Tuning (SFT) applies uniform token-level weighting. This rigidity creates a two-fold failure mode: (i) overemphasizing low-probability target...
4.0 viability
Research Paper·Mar 10, 2026
Efficient Reasoning at Fixed Test-Time Cost via Length-Aware Attention Priors and Gain-Aware Training
We study efficient reasoning under tight compute. We ask how to make structured, correct decisions without increasing test time cost. We add two training only components to small and medium Transforme...
2.0 viability