LLM Interpretability Comparison Hub

3 papers - avg viability 5.7

Reference Surfaces

Benchmark Industry Index Database View Dataset Alternatives State Report Topic Page

Top Papers

Jacobian Scopes: token-level causal attributions in LLMs(7.0)
Develop an interpretability tool for LLMs to identify influential tokens in predictions using gradient-based methods.
Where Knowledge Collides: A Mechanistic Study of Intra-Memory Knowledge Conflict in Language Models(5.0)
A framework to identify and control conflicting knowledge in language models using mechanistic interpretability methods.
Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy(5.0)
Develop linear probing tools to interpret cognitive complexities in LLMs using Bloom's Taxonomy.