State of LLM Safety

4 papers · avg viability 3.8

Download CSV View topic page

Top papers

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs(6.0)
In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution(5.0)
Hiding in Plain Text: Detecting Concealed Jailbreaks via Activation Disentanglement(2.0)
Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information(2.0)