A Survey on Agentic Security: Applications, Threats and Defenses
Authors: Asif Shahriar*, Md Nafiu Rahman*, Sadif Ahmed*, Farig Sadeque, Md Rizwan Parvez
BRAC University, Qatar Computing Research Institute (QCRI)
Feel free to open an issue/PR or e-mail asif.asr11@gmail.com if you find any missing areas, papers, or benchmarks. We will keep updating this list and survey.
The move from passive language models to autonomous LLM-agents marks a major shift in cybersecurity. These agents can act on their own. They can help with both attacking and defending systems, yet they also create new risks because they make decisions and interact with the world in more complex ways. This gives attackers more paths to mislead or control them.
This repository gathers the resources, taxonomy, and papers linked to the survey “A Survey on Agentic Security: Applications, Threats and Defenses.” The survey offers a clear overview of the security issues that come with autonomous agents. It centers on three questions. What agents can do for security. How agents can be attacked. How they can be protected.
In the survey. we explain the agentic security landscape through three main points:
-
How agents help with offensive and defensive security tasks.
-
Where agents are vulnerable, such as hidden prompts in outside content or harmful data written into memory.
-
How defenders can protect agents so they act safely and reliably.
This overview aims to support more research on agentic security. The goal is to use the strengths of autonomous agents while reducing the risks they bring. This will guide the development of systems that deliver real benefits while keeping harm low.
- Survey Introduction
- Table of Contents
- Related Surveys
- Papers
- Adverserial Benchmarks
-
Security of AI Agents: "Security of AI Agents". Yifeng He et al. arXiv 2024. [Paper]
-
Trustworthy Agents: "A Survey on Trustworthy LLM Agents: Threats and Countermeasures". Miao Yu et al. KDD 2025. [Paper]
-
TRISM: "TRISM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-Based Agentic Multi-Agent Systems". Shaina Raza et al. arXiv 2025. [Paper]
-
Agents Under Threat: "AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways". Zehang Deng et al. arXiv 2024. [Paper]
-
Safety at Scale: "Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety". Xingjun Ma et al. Foundations and Trends in Privacy and Security 2025. [Paper]
-
Multi-Agent Challenges: "Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents". Christian Schroeder de Witt. arXiv 2025. [Paper]
-
Commercial Vulnerabilities: "Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks". Ang Li et al. arXiv 2025. [Paper]
-
Agent Communication: "A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures". Dezhang Kong et al. arXiv 2025. [Paper]
-
Full Stack Safety: "A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment". Kun Wang et al. arXiv 2025. [Paper]
-
PentestGPT: "PentestGPT: An LLM-empowered Automatic Penetration Testing Tool". Gelei Deng et al. USENIX Security 2024. [Paper][GitHub]
-
PentestAgent: "PentestAgent: Incorporating LLM Agents to Automated Penetration Testing". Xiangmin Shen et al. ASIA CCS 2025. [Paper][GitHub]
-
Vulnbot: "VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework". He Kong et al. arXiv 2025. [Paper][GitHub]
-
Aracne: "Aracne: An LLM-based Autonomous Shell Pentesting Agent". Tomas Nieponice et al. arXiv 2025. [Paper][Github]
-
Cochise: "Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks". Andreas Happe and Jürgen Cito. ACM Trans. Softw. Eng. Methodol. 2025. [Paper][GitHub]
-
HackSynth: "Hacksynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing". Lajos Muzsai et al. arXiv 2024. [Paper][Github]
-
AutoPentest: "AutoPentest: Enhancing Vulnerability Management with Autonomous LLM Agents". Julius Henke. arXiv 2025. [Paper][Github]
-
LLM-Pentest-Efficacy: "On the Surprising Efficacy of LLMs for Penetration-Testing". Andreas Happe and Jürgen Cito. arXiv 2025. [Paper]
-
Incalmo: "Incalmo: An Autonomous LLM-assisted System for Red Teaming Multi-Host Networks". Brian Singer et al. arXiv 2025. [Paper][Github]
-
xOffense: "xOffense: An AI-Driven Autonomous Penetration Testing Framework with Offensive Knowledge-Enhanced LLMs and Multi Agent Systems". Phung Duc Luong et al. arXiv 2025. [Paper]
-
AutoPenBench: "AutoPenBench: Benchmarking Generative Agents for Penetration Testing". Luca Gioacchini et al. arXiv 2024. [Paper][Github]
-
AI-Pentest-Benchmark: "Towards automated penetration testing: Introducing LLM Benchmark, Analysis, and Improvements". Isamu Isozaki et al. arXiv 2024. [Paper][Github]
-
Locus: "Locus: Agentic Predicate Synthesis for Directed Fuzzing". Jie Zhu et al. arXiv 2025. [Paper]
-
ChatAFL: "Large Language Model Guided Protocol Fuzzing". Ruijie Meng et al. NDSS 2024. [Paper][Github]
-
One-Day-Exploits: "LLM Agents Can Autonomously Exploit One-Day Vulnerabilities". Richard Fang et al. arXiv 2024. [Paper]
-
Zero-Day-Exploits: "Teams of LLM Agents Can Exploit Zero-Day Vulnerabilities". Yuxuan Zhu et al. arXiv 2025. [Paper]
-
A2: "Agentic Discovery and Validation of Android App Vulnerabilities". Ziyue Wang and Liyi Zhou. arXiv 2025. [Paper]
-
Sec-Bench: "Sec-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks". Hwiwon Lee et al. arXiv 2025. [Paper][Github]
-
CVE-Bench (Web): "CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities". Yuxuan Zhu et al. arXiv 2025. [Paper][Github]
-
CVE-bench (Repair): "CVE-bench: Benchmarking LLM-based Software Engineering Agent's Ability to Repair Real-World CVE Vulnerabilities". Peiran Wang et al. NAACL 2025. [Paper]
-
ExCyTInBench: "Excytin-bench: Evaluating LLM Agents on Cyber Threat Investigation". Yiran Wu et al. arXiv 2025. [Paper][Github]
-
LLM-Fuzzer: "LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks". Jiahao Yu et al. USENIX Security 2024. [Paper][Github]
-
TitanFuzz: "Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models". Yinlin Deng et al. ISSTA 2023. [Paper][Github]
-
FuzzGPT: "Large Language Models Are Edge-Case Generators: Crafting Unusual Programs for Fuzzing Deep Learning Libraries". Yinlin Deng et al. ICSE 2024. [Paper][Github]
-
Dark-Side-Agents: "The Dark Side of LLMs: Agent-Based Attacks for Complete Computer Takeover". Matteo Lupinacci et al. arXiv 2025. [Paper]
-
MalGen: "MalGen: A Generative Agent Framework for Modeling Malicious Software in Cybersecurity". Bikash Saha and Sandeep Kumar Shukla. arXiv 2025. [Paper]
-
AiTM: "Red-Teaming LLM Multi-Agent Systems via Communication Attacks". Pengfei He et al. ACL 2025. [Paper][Github]
-
CVE-Genie: "From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs". Saad Ullah et al. arXiv 2025. [Paper][Github]
-
LLM4CVE: "LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models". Mohamad Fakih et al. arXiv 2025. [Paper][Link]
-
RAG-Incident-Response: "Advancing Autonomous Incident Response: Leveraging LLMs and Cyber Threat Intelligence". Amine Tellache et al. arXiv 2025. [Paper]
-
IRCopilot: "RCopilot: Automated Incident Response with Large Language Models". Xihuan Lin et al. arXiv 2025. [Paper][Code]
-
CORTEX: "Cortex: Collaborative LLM Agents for High-Stakes Alert Triage". Bowen Wei et al. arXiv 2025. [Paper]
-
AutoBnB: "AutoBnB: Multi-Agent Incident Response with Large Language Models". Zefang Liu. ISDFS 2025. [Paper][Github]
-
LLM-in-the-SOC-Empirical-Study: "LLMs in the SOC: An Empirical Study of Human-AI Collaboration in Security Operations Centres". Ronal Singh et al. arXiv 2025. [Paper][Github]
-
CyberSOCEval: "CyberSOCEval: Benchmarking LLMs Capabilities for Malware Analysis and Threat Intelligence Reasoning". Lauren Deason et al. arXiv 2025. [Paper][Data][Github]
-
Log-Analysis-Survey: "Automated Threat Detection and Response Using LLM Agents". Ramasankar Molleti et al. World Journal of Advanced Research and Reviews 2024. [Paper]
-
ProvSEEK: "LLM-Driven Provenance Forensics for Threat Investigation and Detection". Kunal Mukherjee and Murat Kantarcioglu. arXiv 2025. [Paper][Github]
-
CTI-Vulnerabilities: "Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence". Yuqiao Meng et al. arXiv 2025. [Paper][Github]
-
LLMCloudHunter: "LLMCloudHunter: Harnessing LLMs for Automated Extraction of Detection Rules from Cloud-Based CTI". Yuval Schwartz et al. WWW 2025. [Paper][Github]
-
RepoAudit: "RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing". Jinyao Guo et al. ICML 2025. [Paper][Github]
-
LLM-Cloud-Forensics: "LLM-Powered Automated Cloud Forensics: From Log Analysis to Investigation". Dalal Alharthi and Rozhin Yasaei. IEEE CLOUD 2025 . [Paper]
-
CyberSleuth: "CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics". Stefano Fumero et al. arXiv 2025. [Paper][[Github](https://github.com/SmartData-Polito/LLM_Agent_ Cybersecurity_Forensic)]
-
GALA: "GALA: Can Graph-Augmented Large Language Model Agentic Workflows Elevate Root Cause Analysis?". Yifang Tian et al. arXiv 2025. [Paper]
-
MAST: "Why Do Multiagent Systems Fail?". Melissa Z Pan et al. ICLR Workshop 2025. [Paper][Github]
-
CIAF: "Cloud Investigation Automation Framework (CIAF): An AI-Driven Approach to Cloud Forensics". Dalal Alharthi and Ivan Roberto Kawaminami Garcia. arXiv 2025. [Paper]
-
CVE-Bench (Repair): "CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities". Yuxuan Zhu et al. ICML 2025. [Paper][Github]
-
RepairAgent: "RepairAgent: An Autonomous, LLM-Based Agent for Program Repair". Islem Bouzenia et al. ICSE 2025. [Paper][Github]
-
Gemini-Patching: "AI-Powered Patching: The Future of Automated Vulnerability Fixes". Jan Keller and Jan Nowakowski. Technical Report 2024. [Link]
-
IaC-Remediation: "LLM Agentic Workflow for Automated Vulnerability Detection and Remediation in Infrastructure-as-Code".
Dheer Toprani and Vijay K. Madisetti. IEEE Access 2025. [Paper]
-
Cloud-Infrastructure-AI-Agent: "Cloud Infrastructure Management in the Age of AI Agents". Zhenning Yang et al. ACM SIGOPS Operating Systems Review 2025. [Paper]
-
KubeIntellect: "KubeIntellect: A Modular LLM-Orchestrated Agent Framework for End-to-End Kubernetes Management". Mohsen Seyedkazemi Ardebili and Andrea Bartolini. 2025. [Paper][Github]
-
LLMSecConfig: "LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations". Ziyang Ye et al. 2025. [Paper]
-
BARTPredict: "BARTPredict: Empowering IoT Security with LLM-Driven Cyber Threat Prediction". Alaeddine Diaf et al. 2025. [Paper]
-
IaC-Remediation: "LLM Agentic Workflow for Automated Vulnerability Detection and Remediation in Infrastructure-as-Code". Dheer Toprani and Vijay K. Madisetti. IEEE Access 2025. [Paper]
-
MAPTA: "Multi-Agent Penetration Testing AI for the Web". Isaac David and Arthur Gervais. 2025. [Paper][Github]
-
Browsing-Dangers: "The Hidden Dangers of Browsing AI Agents". Mykyta Mudryi et al. 2025. [Paper]
-
AIOS: "AIOS: LLM Agent Operating System". Kai Mei et al. Conference on Language Modeling 2025. [Paper][Github]
-
OS-Agents-Survey: "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use". Xueyu Hu et al. 2025. [Paper][Github]
-
Prompt-Flow-Integrity: "Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents". Juhee Kim et al. 2025. [Paper][Github]
-
Progent: "Progent: Programmable Privilege Control for LLM Agents". Tianneng Shi et al. 2025. [Paper][Github]
-
LISA: "LISA Technical Report: An Agentic Framework for Smart Contract Auditing". Izaiah Sun et al. 2025. [Paper][Github]
-
SmartLLM: "SmartLLM: Smart Contract Auditing Using Custom Generative AI". Jun Kevin and Pujianto Yugopuspito. 2025. [Paper]
-
FineTuning-Auditing: "Combining Fine-Tuning and LLM-Based Agents for Intuitive Smart Contract Auditing with Justifications". Wei Ma et al. 2024. [Paper]
-
AuditGPT: "AuditGPT: Auditing Smart Contracts with ChatGPT". Shihao Xia et al. 2024. [Paper][Github]
-
HIPAA-Agent: "Towards a HIPAA Compliant Agentic AI System in Healthcare". Subash Neupane et al. 2025. [Paper]
-
LLM-Privacy-GuardRails: "Deploying Privacy Guardrails for LLMs: A Comparative Analysis of Real-World Applications". Shubhi Asthana et al. 2025. [Paper]
-
Embodied-AI-Security: "Towards Robust and Secure Embodied AI: A Survey on Vulnerabilities and Attacks". Wenpeng Xing et al. 2025. [Paper]
-
Polymorphic Prompt: "To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt". Zhilong Wang et al. arXiv 2025. [Paper][Github]
-
AgentDojo: "AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents". Edoardo Debenedetti et al. NeurIPS 2024. [Paper][Github]
-
App-Injection: "Prompt Injection Attack Against LLM-Integrated Applications". Yi Liu et al. arXiv 2024. [Paper][Github]
-
InjecAgent: "InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents". Qiusi Zhan et al. ACL 2024. [Paper][Github]
-
Commercial-Agent-Vulnerability: "Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks". Ang Li et al. arXiv 2025. [Paper]
-
BIPIA: "Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models". Jingwei Yi et al. KDD 2025. [Paper][Github]
-
Prompt Infection: "Prompt Infection: LLM-to-LLM Prompt Injection Within Multi-Agent Systems". Donghyun Lee and Mo Tiwari. arXiv 2024. [Paper]
-
MINJA: "A Practical Memory Injection Attack Against LLM Agents". Shen Dong et al. arXiv 2025. [Paper]
-
Data Leakage: "Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution". Meysam Alizadeh et al. arXiv 2025. [Paper]
-
AgentVigil: "AgentVigil: Generic Black-Box Red-Teaming for Indirect Prompt Injection Against LLM Agents". Zhun Wang et al. arXiv 2025. [Paper]
-
ASB: "Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-Based Agents". Hanrong Zhang et al. ICLR 2025. [Paper][Github]
-
AgentHarm: "AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents". Maksym Andriushchenko et al. ICLR 2025. [Paper][Hugging Face]
-
Adaptive Attacks: "Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents". Qiusi Zhan et al. NAACL 2025. [Paper][Github]
-
Poisoning Attacks Review: "A Systematic Review of Poisoning Attacks Against Large Language Models". Neil Fendley et al. arXiv 2025. [Paper]
-
AgentPoison: "AgentPoison: Red-Teaming LLM Agents via Poisoning Memory or Knowledge Bases". Zhaorun Chen et al. NeurIPS 2024. [Paper][Github]
-
PoisonBench: "PoisonBench: Assessing Large Language Model Vulnerability to Poisoned Preference Data". Tingchen Fu et al. ICML 2025. [Paper][Github]
-
Poisoning Trends: "Scaling Trends for Data Poisoning in LLMs". Dillon Bowen et al. AAAI 2025. [Paper]
-
Advertisement Embedding Attacks: "Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models". Qiming Guo et al. arXiv 2025. [Paper]
-
LLMs are not Aligned Browser Agents: "Aligned LLMs Are Not Aligned Browser Agents". Priyanshu Kumar et al. ICLR 2025. [Paper]
-
Web Vulnerability: "Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis". Jeffrey Yang Fan Chiang et al. ICLR Workshop 2025. [Paper]
-
JAWS-BENCH: "Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks". Shoumik Saha et al. arXiv 2025. [Paper]
-
LLM-Fuzzer: "LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks". Jiahao Yu et al. USENIX Security 2024. [Paper][Github]
-
Many-Shot Jailbreak: "Many-Shot Jailbreaking". Cem Anil et al. NeurIPS 2024. [Paper]
-
Robot Jailbreak: "Jailbreaking LLM-Controlled Robots". Alexander Robey et al. arXiv 2024. [Paper][Github]
-
PromptInject: "Ignore Previous Prompt: Attack Techniques for Language Models". Fábio Perez and Ian Ribeiro. arXiv 2022. [Paper][Github]
-
CAIN: "CAIN: Hijacking LLM-Humans Conversations via Malicious System Prompts". Viet Pham and Thai Le. arXiv 2025. [Paper][Github]
-
PseudoConversation Injection: "Pseudo-Conversation Injection for LLM Goal Hijacking". Zheng Chen and Buhui Yao. arXiv 2024. [Paper]
-
AI²: "Towards Action Hijacking of Large Language Model-Based Agent". Yuyang Zhang et al. arXiv 2025. [Paper]
-
InfoRM: "InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling". Yuchun Miao et al. 2024. [Paper][Github]
-
Preference as Reward(PAR): "Reward Shaping to Mitigate Reward Hacking in RLHF". Jiayi Fu et al. arXiv 2025. [Paper][Github]
-
Spec Gaming: "Demonstrating Specification Gaming in Reasoning Models". Alexander Bondarenko et al. arXiv 2025. [Paper][Github]
-
Bayesian Adversarial Robust Dec-POMDP (BARDec-POMDP) framework: "Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game". Simin Li et al. ICLR 2024. [Paper][Github]
-
Byzantine Coord: "Byzantine-Robust Decentralized Coordination of LLM Agents". Yongrae Jo and Chanik Park. arXiv 2025. [Paper]
-
Red Teaming LMs: "Red Teaming Language Models with Language Models". Ethan Perez et al. arXiv 2022. [Paper]
-
MART: "MART: Improving LLM Safety with Multi-Round Automatic Red-Teaming". Suyu Ge et al. arXiv 2023. [Paper]
-
Comm Attacks: "Red-Teaming LLM Multi-Agent Systems via Communication Attacks". Pengfei He et al. ACL 2025. [Paper]
-
AgentFuzz: "Make Agent Defeat Agent: Automatic Detection of Taint-Style Vulnerabilities in LLM-Based Agents". Fengyu Liu et al. USENIX Security 2025. [Paper][Code]
-
Privacy Risks: "Searching for Privacy Risks in LLM Agents via Simulation". Yanzhe Zhang and Diyi Yang. arXiv 2025. [Paper][Github]
-
ASB: "Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-Based Agents". Hanrong Zhang et al. ICLR 2025. [Paper][Github]
-
RAS-Eval: "RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments". Yuchuan Fu et al. arXiv 2025. [Paper][Github]
-
AgentDojo: "AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents". Edoardo Debenedetti et al. NeurIPS 2024. [Paper][Github]
-
AgentHarm: "AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents". Maksym Andriushchenko et al. ICLR 2025. [Paper][Hugging Face]
-
SafeArena: "SafeArena: Evaluating the Safety of Autonomous Web Agents". Ada Defne Tur et al. ICML 2025. [Paper][Github]
-
ST-WebAgentBench: "ST-WebAgentBench: A Benchmark for Evaluating Safety & Trustworthiness in Web Agents". Ido Levy et al. arXiv 2025. [Paper][Github]
-
JAWS-BENCH: "Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks". Shoumik Saha et al. arXiv 2025. [Paper][]
-
SandboxEval: "SandboxEval: Towards Securing Test Environment for Untrusted Code". Rafiqul Rabin et al. arXiv 2025. [Paper]
-
InjecAgent: "InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents". Qiusi Zhan et al. ACL 2024. [Paper][Github]
-
BrowserART: "Aligned LLMs Are Not Aligned Browser Agents". Priyanshu Kumar et al. ICLR 2025. [Paper][Github]
-
CVE-Bench: "CVE-Bench: A Benchmark for AI Agents Ability to Exploit Real-World Web Application Vulnerabilities". Yuxuan Zhu et al. ICML 2025. [Paper][Github]
-
DoomArena: "DoomArena: A Framework for Testing AI Agents Against Evolving Security Threats". Léo Boisvert et al. CoLM 2025. [Paper][Github]
-
WebArena: "WebArena: A Realistic Web Environment for Building Autonomous Agents". Shuyan Zhou et al. arXiv 2024. [Paper][Github]
-
ACE: "ACE: A Security Architecture for LLM-Integrated App Systems". Evan Li et al. arXiv 2025. [Paper][Github]
-
Resilient-Agents: "Architecting Resilient LLM Agents: A Guide to Secure Plan-Then-Execute Implementations". Ron F. Del Rosario et al. arXiv 2025. [Paper]
-
Task Shield: "The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents". Feiran Jia et al. ACL 2025. [Paper]
-
CaMeL: "Defeating Prompt Injections by Design". Edoardo Debenedetti et al. arXiv 2025. [Paper][Github]
-
Polymorphic Prompt Assembling (PPA): "To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt". Zhilong Wang et al. arXiv 2025. [Paper][Github]
-
TRISM: "TRISM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-Based Agentic Multi-Agent Systems". Shaina Raza et al. arXiv 2025. [Paper]
-
Trustworthy-Agentic: "Trustworthy Agentic AI Systems: A Cross-Layer Review of Architectures, Threat Models, and Governance Strategies". Ibrahim Adabara et al. F1000Research 2025. [Paper]
-
ModelGuard: "ModelGuard: Information-Theoretic Defense Against Model Extraction Attacks". Minxue Tang et al. USENIX Security 2024. [Paper][Github]
-
Security-Of-AI-Agents: "Security of AI Agents". Yifeng He et al. arXiv 2024. [Paper][Github]
-
Threat-Model: "Securing Agentic AI: A Comprehensive Threat Model and Mitigation Framework for Generative AI Agents". Vineeth Sai Narajala and Om Narayan. arXiv 2025. [Paper]
-
D-CIPHER: "D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive Security". Meet Udeshi et al. arXiv 2025. [Paper][Github]
-
Secure-Multi-LLM: "Secure Multi-LLM Agentic AI and Agentification for Edge General Intelligence by Zero-Trust: A Survey". Yinqiu Liu et al. arXiv 2025. [Paper]
-
PhishDebate: "PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection". Wenhao Li et al. arXiv 2025. [Paper]
-
Robustness-Smoothing: "Enhancing Robustness of LLM-Driven Multi-Agent Systems Through Randomized Smoothing". Jinwei Hu et al. Chinese Journal of Aeronautics 2025. [Paper]
-
Challenges-Multi-Agent: "LLM Multi-Agent Systems: Challenges and Open Problems". Shanshan Han et al. arXiv 2025. [Paper]
-
Cross-Domain-Challenges: "Seven Security Challenges That Must Be Solved in Cross-Domain Multi-Agent LLM Systems". Ronny Ko et al. arXiv 2025. [Paper]
-
R^2-Guard: "R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning". Mintong Kang and Bo Li. ICLR 2025. [Paper]
-
AgentSpec: "AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents". Haoyu Wang et al. arXiv 2025. [Paper][Github]
-
SentinelAgent: "SentinelAgent: Graph-Based Anomaly Detection in Multi-Agent Systems". Xu He et al. arXiv 2025. [Paper]
-
Confront-Insider: "Confront Insider Threat: Precise Anomaly Detection in Behavior Logs Based on LLM Fine-Tuning". Shuang Song et al. COLING 2025. [Paper]
-
GuardAgent: "GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning". Zhen Xiang et al. arXiv 2025. [Paper][Github]
-
AgentGuard: "AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration". Jizhou Chen and Samuel Lee Cong. arXiv 2025. [Paper]
-
AGrail: "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection". Weidi Luo et al. arXiv 2025. [Paper][Github]
-
PSG-Agent: "PSG-Agent: Personality-Aware Safety Guardrail for LLM-Based Agents". Yaozu Wu et al. arXiv 2025. [Paper]
-
Bedrock-Security: "Securing Amazon Bedrock Agents: Safeguarding Against Indirect Prompt Injections". Amazon Web Services. AWS White Paper 2024. [Link]
-
IRIS: "IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities". Ziyang Li et al. ICLR 2025. [Paper][Github]
-
Chain-of-Agents: "Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL". Weizhen Li et al. arXiv 2025. [Paper][Github]
-
RepoAudit: "RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing". Jinyao Guo et al. ICML 2025. [Paper][Github]
-
Knighter: "Knighter: Transforming Static Analysis with LLM-Synthesized Checkers". Chenyuan Yang et al. SOSP 2025. [Paper]
-
VeriPlan: "VeriPlan: Integrating Formal Verification and LLMs into End-User Planning". Christine P. Lee et al. CHI 2025. [Paper]
-
MCMAS-OP: "Formal Verification of Open Multi-Agent Systems". Panagiotis Kouvaros et al. AAMAS 2019. [Paper]
-
Specifying-Behavior: "Formally Specifying the High-Level Behavior of LLM-Based Agents". Maxwell Crouse et al. arXiv 2024. [Paper]
-
AgentDojo: "AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents". Edoardo Debenedetti et al. NeurIPS 2024. [Paper][Github]
-
SafeArena: "SafeArena: Evaluating the Safety of Autonomous Web Agents". Ada Defne Tur et al. ICML 2025. [Paper][Github]
-
RAS-Eval: "RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments". Yuchuan Fu et al. arXiv 2025. [Paper][Github]
-
ASB: "Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-Based Agents". Hanrong Zhang et al. ICLR 2025. [Paper][Github]
-
AgentHarm: "AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents". Maksym Andriushchenko et al. ICLR 2025. [Paper][Hugging Face]
-
DoomArena: "DoomArena: A Framework for Testing AI Agents Against Evolving Security Threats". Léo Boisvert et al. CoLM 2025. [Paper][Github]
-
ToolFuzz: "ToolFuzz - Automated Agent Tool Testing". Ivan Milev et al. arXiv 2025. [Paper]
-
aiXamine: "aiXamine: Simplified LLM Safety and Security". Fatih Deniz et al. arXiv 2025. [Paper]
-
TurkingBench: "TurkingBench: A Challenge Benchmark for Web Agents". Kevin Xu et al. NAACL 2025. [Paper]''[Github]
-
τ-Bench: "τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains". Shunyu Yao et al. 2024. [Paper][Github]
-
WebArena: "WebArena: A Realistic Web Environment for Building Autonomous Agents". Shuyan Zhou et al. arXiv 2024. [Paper][Github]
-
Adaptive-Attacks: "Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents". Qiusi Zhan et al. NAACL 2025. [Paper][Github]
-
Open-Challenges: "Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents". Christian Schroeder de Witt. arXiv 2025. [Paper]
-
Trustworthy-Survey: "A Survey on Trustworthy LLM Agents: Threats and Countermeasures". Miao Yu et al. KDD 2025. [Paper]
-
Risk-Navigating: "Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents". Yuyou Gan et al. arXiv 2024. [Paper]
-
Safety-At-Scale: "Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety". Xingjun Ma et al. Foundations and Trends in Privacy and Security 2025. [Paper]
-
Full-Stack-Safety: "A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment". Kun Wang et al. arXiv 2025. [Paper]
-
Agents-Under-Threat: "AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways". Zehang Deng et al. arXiv 2024. [Paper]
-
Agentic-AI Healthcare: "Agentic-AI Healthcare: Multilingual, Privacy-First Framework with MCP Agents". Mohammad A. Shehab arXiv 2025. [Paper]
-
LAW: "LAW: A Legal Agent Framework with Tool Use and Safety Mechanisms". William Watson et al. arXiv 2025. [Paper]
-
PrivacyChecker: "Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents". Shouju Wang et al. arXiv 2025. [Paper][Github]
-
Contract Security: "AI-Powered Contract Security: Managing Expiry, Compliance, and Risk Mitigation Through Deep Learning and LLMs". Dilshad Ahmad Mhia-Alddin et al. arXiv 2025. [Paper]
| Benchmark | Environment | Attacks / Threat | Findings | Model Insights |
|---|---|---|---|---|
| ASB (Zhang et al., 2025a) | Multi-domain agent tasks with 400+ tools; 10 scenarios; standardized evaluation harness. | Prompt injection (primary), memory attacks, data poisoning, unauthorized tool invocation, privilege escalation; 27 attack/defense classes. | Existing agents highly vulnerable; many fail even simple attack tasks; reports refusal rate and a unified resilience metric. | Standardized, reproducible testbed spanning both offensive and defensive evaluation; clear taxonomy centered on prompt-injection surfaces. |
| RAS-Eval (Fu et al., 2025c) | Real-world domains (finance, healthcare); 80 scenarios / 3,802 tasks; simulation and real tool use. | 11 CWE categories; broad adversarial stress. | Task completion drops by ∼36.8% on average (up to 85.7%) under attack. | Maps agent failures to CWE; couples domain realism with measurable robustness deltas. |
| AgentDojo (Debenedetti et al., 2024) | Dynamic, stateful environment; 97 realistic multi-turn tool tasks (e.g., email, banking) with formal, deterministic checks. | Prompt injection via untrusted data/tools; security–utility trade-off analysis. | Defenses reduce attack success but degrade task utility; SOTA LLMs struggle on realistic pipelines. | Makes the security–utility trade-off explicit; judge is environment-state-based (no LLM-as-judge). |
| AgentHarm (Andriushchenko et al., 2025) | Agent tasks spanning 110 harmful tasks across 11 harm categories. | Jailbreaks, direct injections, self-compromising actions, unsafe code execution. | Significant gaps in compliance and contextual safety across agents. | Introduces robustness, refusal accuracy, and ethical consistency metrics focused on harm reduction. |
| SafeArena (Tur et al., 2025) | Web agents across multiple websites; 250 benign vs. 250 harmful tasks. | Malicious requests: misinformation, illegal actions, malware-related behaviors. | SOTA (e.g., GPT-4o) completes 34.7% of malicious requests. | Demonstrates real web-workflow risks; quantifies unsafe completions under realistic browsing. |
| ST-WebAgentBench (Levy et al., 2025) | Enterprise-like web tasks: 222 tasks with 646 policy instances. | Policy compliance (consent, data boundaries); defines CuP, pCuP, and Risk Ratio. | Policy-compliant success is ≈38% lower than standard completion. | Shifts evaluation beyond raw success to trust/safety-constrained success. |
| JAWS-BENCH (Saha et al., 2025) | Code agents with executable-aware judging across JAWS-0/1/M (empty, single-file, multi-file). | Systematic jailbreaking to elicit harmful, executable code; tests compliance, attack success, compile, run. | Up to 75% attack success in multi-file codebases. | Execution-grounded judging prevents false safety from mere textual refusals; highlights multi-file risks. |
| SandboxEval (Rabin et al., 2025) | Code-execution testbeds; 51 hand-crafted sandbox test cases (applied to Dyff). | Dangerous behaviors: FS tampering, data exfiltration, network access, etc. | Naive sandbox configurations can be compromised by malicious code. | Security must include runtime isolation posture, not only agent policy. |
| BrowserART (Kumar et al., 2025) | Browser-agent red-teaming toolkit across synthetic & real sites (100 harmful behaviors). | Jailbreaks against browser agents; transfer of chatbot jailbreaks with human rewrites. | Backbone LLM refusal does not transfer: GPT-4o pursued 98/100, o1-preview 63/100 harmful behaviors. | Agentic, tool-using context weakens safety adherence even without exotic attacks. |
| InjecAgent (Zhan et al., 2024) | Tool-integrated agents; 1,054 test cases across 17 user tools and 62 attacker tools. | Indirect prompt injections via external content, API outputs, chained tools; path-augmented categorization. | Well-aligned agents frequently execute compromised instructions under indirect injections. | Provides fine-grained, propagation-path metrics; standardizes indirect-injection stress for tool-augmented agents. |
If you find this survey or repository useful, please cite our work:
@article{shahriar2025agentic,
title={A Survey on Agentic Security: Applications, Threats and Defenses},
author={Shahriar, Asif and Rahman, Md Nafiu and Ahmed, Sadif and Sadeque, Farig and Parvez, Md Rizwan},
journal={arXiv preprint arXiv:2510.06445},
year={2025}
}