Top papers
- ExplainerPFN: Towards tabular foundation models for model-free zero-shot feature importance estimations(5.0)
- DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders(5.0)
- Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution(5.0)
- The Confidence Manifold: Geometric Structure of Correctness Representations in Language Models(2.0)