AI Security

38papers
5.3viability
-35%30d

State of the Field

Current research in AI security is increasingly focused on addressing vulnerabilities in generative models and large language models (LLMs) as they become integral to various applications. Recent work on latent space watermarking has introduced efficient methods for embedding watermarks directly into generative models, significantly enhancing robustness and speed compared to traditional pixel-based techniques. Meanwhile, advancements in detecting hubness poisoning in retrieval-augmented generation systems reveal critical flaws that could be exploited to manipulate content and degrade performance. Additionally, frameworks like Jailbreak Foundry are streamlining the evaluation of security threats by automating the translation of jailbreak techniques into executable modules, ensuring that benchmarks remain relevant amid rapidly evolving attack strategies. The introduction of tools like AgenticSCR for secure code review and SpecularNet for phishing detection highlights a shift towards proactive measures that leverage AI to identify and mitigate risks before they manifest. Collectively, these efforts indicate a concerted push towards fortifying AI systems against an expanding array of security threats.

Last updated Mar 4, 2026

Papers

1–10 of 38
Research Paper·Feb 25, 2026

HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external knowledge via vector similarity search. Nevertheless, thes...

9.0 viability
Research Paper·Feb 27, 2026

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, and j...

9.0 viability
Research Paper·Jan 22, 2026

Learning to Watermark in the Latent Space of Generative Models

Existing approaches for watermarking AI-generated images often rely on post-hoc methods applied in pixel space, introducing computational overhead and potential visual artifacts. In this work, we expl...

9.0 viability
Research Paper·Feb 19, 2026

Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches...

8.0 viability
Research Paper·Mar 2, 2026

Phishing the Phishers with SpecularNet: Hierarchical Graph Autoencoding for Reference-Free Web Phishing Detection

Phishing remains the most pervasive threat to the Web, enabling large-scale credential theft and financial fraud through deceptive webpages. While recent reference-based and generative-AI-driven phish...

8.0 viability
Research Paper·Jan 27, 2026

AgenticSCR: An Autonomous Agentic Secure Code Review for Immature Vulnerabilities Detection

Secure code review is critical at the pre-commit stage, where vulnerabilities must be caught early under tight latency and limited-context constraints. Existing SAST-based checks are noisy and often m...

8.0 viability
Research Paper·Feb 5, 2026·B2BFintech

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms...

7.0 viability
Research Paper·Feb 24, 2026

ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack the agent's execution. Existing defenses typicall...

7.0 viability
Research Paper·Jan 8, 2026

BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents

Large language model (LLM) agents execute tasks through multi-step workflows that combine planning, memory, and tool use. While this design enables autonomy, it also expands the attack surface for bac...

7.0 viability
Research Paper·Jan 15, 2026

AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

Artificial intelligence (AI) agents are increasingly used in a variety of domains to automate tasks, interact with users, and make decisions based on data inputs. Ensuring that AI agents perform only ...

7.0 viability
Page 1 of 4