AI Safety and Security Comparison Hub

3 papers - avg viability 7.7

Reference Surfaces

Building Production-Ready Probes For Gemini(8.0)
Deploy cost-effective AI misuse detection systems using flexible activation probes for context adaptation.
Reasoning-Oriented Programming: Chaining Semantic Gadgets to Jailbreak Large Vision Language Models(8.0)
Introducing a framework that exploits vulnerabilities in large vision-language models to bypass safety alignment.
RiskAtlas: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation(7.0)
RiskAtlas provides a framework to expose domain-specific risks in LLMs using knowledge-graph-guided prompt generation, targeting safety evaluation in high-stakes sectors like finance and healthcare.