Papers
1–3 of 3Research Paper·Feb 12, 2026
From Atoms to Trees: Building a Structured Feature Forest with Hierarchical Sparse Autoencoders
Sparse autoencoders (SAEs) have proven effective for extracting monosemantic features from large language models (LLMs), yet these features are typically identified in isolation. However, broad eviden...
4.0 viability
Research Paper·Jan 29, 2026
Do Reasoning Models Enhance Embedding Models?
State-of-the-art embedding models are increasingly derived from decoder-only Large Language Model (LLM) backbones adapted via contrastive learning. Given the emergence of reasoning models trained via ...
2.0 viability
Research Paper·Feb 12, 2026
Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs
Model diffing, the process of comparing models' internal representations to identify their differences, is a promising approach for uncovering safety-critical behaviors in new models. However, its app...
2.0 viability