Safety in AI

Trending

3papers

5.3viability

+100%30d

Papers

1–3 of 3

Research Paper·Mar 12, 2026

OrthoEraser: Coupled-Neuron Orthogonal Projection for Concept Erasure

Text-to-image (T2I) models face significant safety risks from adversarial induction, yet current concept erasure methods often cause collateral damage to benign attributes when suppressing selected ne...

7.0 viability

Research Paper·Mar 16, 2026

Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection

Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety inadvertently degrades performance on general visual-grou...

7.0 viability

Research Paper·Mar 16, 2026

Beyond Creed: A Non-Identity Safety Condition A Strong Empirical Alternative to Identity Framing in Low-Data LoRA Fine-Tuning

How safety supervision is written may matter more than the explicit identity content it contains. We study low-data LoRA safety fine-tuning with four supervision formats built from the same core safet...

2.0 viability

Safety in AI

Papers

OrthoEraser: Coupled-Neuron Orthogonal Projection for Concept Erasure

Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection

Beyond Creed: A Non-Identity Safety Condition A Strong Empirical Alternative to Identity Framing in Low-Data LoRA Fine-Tuning

Filters