Papers
1–3 of 3Research Paper·Mar 12, 2026
OrthoEraser: Coupled-Neuron Orthogonal Projection for Concept Erasure
Text-to-image (T2I) models face significant safety risks from adversarial induction, yet current concept erasure methods often cause collateral damage to benign attributes when suppressing selected ne...
7.0 viability
Research Paper·Mar 16, 2026
Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection
Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety inadvertently degrades performance on general visual-grou...
7.0 viability
Research Paper·Mar 16, 2026
Beyond Creed: A Non-Identity Safety Condition A Strong Empirical Alternative to Identity Framing in Low-Data LoRA Fine-Tuning
How safety supervision is written may matter more than the explicit identity content it contains. We study low-data LoRA safety fine-tuning with four supervision formats built from the same core safet...
2.0 viability