LLM Interpretability

3papers

5.7viability

-50%30d

Papers

1–3 of 3

Research Paper·Jan 23, 2026

Jacobian Scopes: token-level causal attributions in LLMs

Large language models (LLMs) make next-token predictions based on clues present in their context, such as semantic descriptions and in-context examples. Yet, elucidating which prior tokens most strong...

7.0 viability

Research Paper·Jan 14, 2026

Where Knowledge Collides: A Mechanistic Study of Intra-Memory Knowledge Conflict in Language Models

In language models (LMs), intra-memory knowledge conflict largely arises when inconsistent information about the same event is encoded within the model's parametric knowledge. While prior work has pri...

5.0 viability

Research Paper·Feb 19, 2026

Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy

The black-box nature of Large Language Models necessitates novel evaluation frameworks that transcend surface-level performance metrics. This study investigates the internal neural representations of ...