State of Model Interpretability

4 papers · avg viability 4.3

Download CSV View topic page

Top papers

ExplainerPFN: Towards tabular foundation models for model-free zero-shot feature importance estimations(5.0)
DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders(5.0)
Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution(5.0)
The Confidence Manifold: Geometric Structure of Correctness Representations in Language Models(2.0)