Security Considerations for Artificial Intelligence Agents
TL;DR Highlight
Perplexity's NIST submission: a comprehensive breakdown of AI agent security threats and a defense-in-depth strategy guide.
Who Should Read
Security engineers building or auditing AI agent systems, and teams deploying autonomous agents in production who need a threat modeling framework.
Core Mechanics
- Submitted to NIST as a formal analysis of AI agent security threats from Perplexity's perspective
- Identifies and categorizes major threat vectors: prompt injection, tool misuse, memory poisoning, agent hijacking, and data exfiltration
- Proposes a defense-in-depth strategy with multiple independent layers rather than relying on any single control
- Highlights that agent systems introduce new attack surfaces not present in traditional LLM deployments
- Tool use and multi-agent communication are particularly high-risk attack surfaces
- Provides a risk assessment framework for evaluating security posture of agent deployments
Evidence
- Formal document submitted to NIST AI security standards process
- Based on Perplexity's operational experience with production AI agent systems
- Threat categories derived from real observed attack patterns and red team findings
- Defense recommendations validated against known attack scenarios
How to Apply
- Use this as a threat modeling checklist when designing or auditing your AI agent system
- For each tool your agent can call, apply the tool misuse threat model and implement appropriate authorization checks
- Implement input/output filtering at every layer (user input, tool results, inter-agent messages) rather than trusting any single point
Code Example
# Deterministic last-line-of-defense example: allowlist + schema validation before tool calls
ALLOWED_TOOLS = {"web_search", "read_file", "send_email"}
SENSITIVE_TOOLS = {"delete_file", "financial_transfer", "execute_code"}
RATE_LIMIT = {"financial_transfer": 3, "delete_file": 5} # per hour
import re
from typing import Any
def validate_tool_call(tool_name: str, args: dict[str, Any], call_counts: dict) -> tuple[bool, str]:
# 1. Allowlist check
if tool_name not in ALLOWED_TOOLS and tool_name not in SENSITIVE_TOOLS:
return False, f"Tool '{tool_name}' not in allowlist"
# 2. Apply rate limit for sensitive tools
if tool_name in SENSITIVE_TOOLS:
if call_counts.get(tool_name, 0) >= RATE_LIMIT.get(tool_name, 1):
return False, f"Rate limit exceeded for '{tool_name}'"
# 3. Argument schema validation (e.g., email recipient domain validation)
if tool_name == "send_email":
recipient = args.get("to", "")
if not re.match(r'^[\w.-]+@(company\.com|trusted-domain\.com)$', recipient):
return False, f"Email recipient '{recipient}' not in approved domains"
return True, "OK"
# Used in the agent loop
def agent_execute_tool(tool_name: str, args: dict, call_counts: dict):
ok, reason = validate_tool_call(tool_name, args, call_counts)
if not ok:
raise SecurityError(f"Tool call blocked: {reason}")
# High-risk actions require human-in-the-loop
if tool_name in SENSITIVE_TOOLS:
user_confirm = request_human_confirmation(tool_name, args)
if not user_confirm:
raise SecurityError("User rejected sensitive action")
return execute_tool(tool_name, args)Terminology
Related Resources
Original Abstract (Expand)
This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.