Current research in AI security is increasingly focused on addressing vulnerabilities in generative models and large language models (LLMs) as they become integral to various applications. Recent work on latent space watermarking has introduced efficient methods for embedding watermarks directly into generative models, significantly enhancing robustness and speed compared to traditional pixel-based techniques. Meanwhile, advancements in detecting hubness poisoning in retrieval-augmented generation systems reveal critical flaws that could be exploited to manipulate content and degrade performance. Additionally, frameworks like Jailbreak Foundry are streamlining the evaluation of security threats by automating the translation of jailbreak techniques into executable modules, ensuring that benchmarks remain relevant amid rapidly evolving attack strategies. The introduction of tools like AgenticSCR for secure code review and SpecularNet for phishing detection highlights a shift towards proactive measures that leverage AI to identify and mitigate risks before they manifest. Collectively, these efforts indicate a concerted push towards fortifying AI systems against an expanding array of security threats.
Top papers
- Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking(9.0)
- Learning to Watermark in the Latent Space of Generative Models(9.0)
- HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems(9.0)
- AgenticSCR: An Autonomous Agentic Secure Code Review for Immature Vulnerabilities Detection(8.0)
- AegisUI: Behavioral Anomaly Detection for Structured User Interface Protocols in AI Agent Systems(8.0)
- Phishing the Phishers with SpecularNet: Hierarchical Graph Autoencoding for Reference-Free Web Phishing Detection(8.0)
- Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting(8.0)
- Defense Against Indirect Prompt Injection via Tool Result Parsing(7.0)
- Automated Vulnerability Detection in Source Code Using Deep Representation Learning(7.0)
- AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior(7.0)
- The PBSAI Governance Ecosystem: A Multi-Agent AI Reference Architecture for Securing Enterprise AI Estates(7.0)
- Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening(7.0)
- HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation(7.0)
- ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction(7.0)
- BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents(7.0)
- Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning(7.0)
- From Similarity to Vulnerability: Key Collision Attack on LLM Semantic Caching(6.0)
- CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents(6.0)
- Don't believe everything you read: Understanding and Measuring MCP Behavior under Misleading Tool Descriptions(6.0)
- Diffusion-Driven Deceptive Patches: Adversarial Manipulation and Forensic Detection in Facial Identity Verification(6.0)
- Hide and Seek in Embedding Space: Geometry-based Steganography and Detection in Large Language Models(5.0)
- Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems(5.0)
- RvB: Automating AI System Hardening via Iterative Red-Blue Games(5.0)
- The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multi-Step Malware(5.0)
- Consistency of Large Reasoning Models Under Multi-Turn Attacks(5.0)
- Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks(5.0)
- Eliciting Harmful Capabilities by Fine-Tuning On Safeguarded Outputs(4.0)
- Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation(4.0)
- Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale(4.0)
- Recursive language models for jailbreak detection: a procedural defense for tool-augmented agents(3.0)
- Protecting Language Models Against Unauthorized Distillation through Trace Rewriting(3.0)
- Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs(3.0)
- Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection(3.0)
- TensorCommitments: A Lightweight Verifiable Inference for Language Models(2.0)
- The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers(2.0)
- What Breaks Embodied AI Security:LLM Vulnerabilities, CPS Flaws,or Something Else?(2.0)
- Introducing the Generative Application Firewall (GAF)(2.0)
- Stealthy Poisoning Attacks Bypass Defenses in Regression Settings(1.0)
- Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections(1.0)