LLM Interpretability Comparison Hub
3 papers - avg viability 5.7
Top Papers
- Jacobian Scopes: token-level causal attributions in LLMs(7.0)
Develop an interpretability tool for LLMs to identify influential tokens in predictions using gradient-based methods.
- Where Knowledge Collides: A Mechanistic Study of Intra-Memory Knowledge Conflict in Language Models(5.0)
A framework to identify and control conflicting knowledge in language models using mechanistic interpretability methods.
- Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy(5.0)
Develop linear probing tools to interpret cognitive complexities in LLMs using Bloom's Taxonomy.