State of the Field
Recent theoretical advancements in AI are increasingly focused on understanding the underlying mechanics of model behavior and improving their efficiency. Work on spectral superposition is revealing how neural networks manage feature representation, emphasizing the geometric relationships between features, which could enhance interpretability and diagnostics in complex models. Meanwhile, research on rectified flow models is demonstrating significant improvements in sample complexity, offering a more efficient alternative to traditional generative models, which could streamline applications in data generation and simulation. The exploration of self-rewarding language models is shedding light on their iterative alignment capabilities, providing theoretical guarantees that explain their success in improving performance without external feedback. This shift toward rigorous theoretical frameworks not only clarifies existing methodologies but also suggests new pathways for developing AI systems that are both more efficient and interpretable, addressing commercial needs for reliable and understandable AI applications across various industries.
Papers
1–7 of 7Spectral Superposition: A Theory of Feature Geometry
Neural networks represent more features than they have dimensions via superposition, forcing features to share representational space. Current methods decompose activations into sparse linear features...
Order-Optimal Sample Complexity of Rectified Flows
Recently, flow-based generative models have shown superior efficiency compared to diffusion models. In this paper, we study rectified flow models, which constrain transport trajectories to be linear f...
Why Self-Rewarding Works: Theoretical Guarantees for Iterative Alignment of Language Models
Self-Rewarding Language Models (SRLMs) achieve notable success in iteratively improving alignment without external feedback. Yet, despite their striking empirical progress, the core mechanisms driving...
Self-Improvement as Coherence Optimization: A Theoretical Account
Can language models improve their accuracy without external supervision? Methods such as debate, bootstrap, and internal coherence maximization achieve this surprising feat, even matching golden finet...
The logic of KM belief update is contained in the logic of AGM belief revision
For each axiom of KM belief update we provide a corresponding axiom in a modal logic containing three modal operators: a unimodal belief operator $B$, a bimodal conditional operator $>$ and the unimod...
Common Belief Revisited
Contrary to common belief, common belief is not KD4. If individual belief is KD45, common belief does indeed lose the 5 property and keep the D and 4 properties -- and it has none of the other commo...
The Manifold of the Absolute: Religious Perennialism as Generative Inference
This paper formalizes religious epistemology through the mathematics of Variational Autoencoders. We model religious traditions as distinct generative mappings from a shared, low-dimensional latent sp...