Security Considerations for Artificial Intelligence Agents

Mar 12, 2026•Ninghui Li, Kaiyuan Zhang, Kyle Polley +1•View PDF

TL;DR Highlight

Perplexity's NIST submission: a comprehensive breakdown of AI agent security threats and a defense-in-depth strategy guide.

Who Should Read

Security engineers building or auditing AI agent systems, and teams deploying autonomous agents in production who need a threat modeling framework.

Core Mechanics

Submitted to NIST as a formal analysis of AI agent security threats from Perplexity's perspective
Identifies and categorizes major threat vectors: prompt injection, tool misuse, memory poisoning, agent hijacking, and data exfiltration
Proposes a defense-in-depth strategy with multiple independent layers rather than relying on any single control
Highlights that agent systems introduce new attack surfaces not present in traditional LLM deployments
Tool use and multi-agent communication are particularly high-risk attack surfaces
Provides a risk assessment framework for evaluating security posture of agent deployments

Evidence

Formal document submitted to NIST AI security standards process
Based on Perplexity's operational experience with production AI agent systems
Threat categories derived from real observed attack patterns and red team findings
Defense recommendations validated against known attack scenarios

How to Apply

Use this as a threat modeling checklist when designing or auditing your AI agent system
For each tool your agent can call, apply the tool misuse threat model and implement appropriate authorization checks
Implement input/output filtering at every layer (user input, tool results, inter-agent messages) rather than trusting any single point

Code Example

snippet

# Deterministic last-line-of-defense example: allowlist + schema validation before tool calls

ALLOWED_TOOLS = {"web_search", "read_file", "send_email"}
SENSITIVE_TOOLS = {"delete_file", "financial_transfer", "execute_code"}
RATE_LIMIT = {"financial_transfer": 3, "delete_file": 5}  # per hour

import re
from typing import Any

def validate_tool_call(tool_name: str, args: dict[str, Any], call_counts: dict) -> tuple[bool, str]:
    # 1. Allowlist check
    if tool_name not in ALLOWED_TOOLS and tool_name not in SENSITIVE_TOOLS:
        return False, f"Tool '{tool_name}' not in allowlist"
    
    # 2. Apply rate limit for sensitive tools
    if tool_name in SENSITIVE_TOOLS:
        if call_counts.get(tool_name, 0) >= RATE_LIMIT.get(tool_name, 1):
            return False, f"Rate limit exceeded for '{tool_name}'"
    
    # 3. Argument schema validation (e.g., email recipient domain validation)
    if tool_name == "send_email":
        recipient = args.get("to", "")
        if not re.match(r'^[\w.-]+@(company\.com|trusted-domain\.com)$', recipient):
            return False, f"Email recipient '{recipient}' not in approved domains"
    
    return True, "OK"

# Used in the agent loop
def agent_execute_tool(tool_name: str, args: dict, call_counts: dict):
    ok, reason = validate_tool_call(tool_name, args, call_counts)
    if not ok:
        raise SecurityError(f"Tool call blocked: {reason}")
    
    # High-risk actions require human-in-the-loop
    if tool_name in SENSITIVE_TOOLS:
        user_confirm = request_human_confirmation(tool_name, args)
        if not user_confirm:
            raise SecurityError("User rejected sensitive action")
    
    return execute_tool(tool_name, args)

Terminology

Defense-in-depthA security strategy using multiple independent layers of controls so that failure of any one layer doesn't compromise the whole system.

Prompt InjectionAn attack where malicious instructions are embedded in content the agent processes, hijacking its behavior.

Tool MisuseAn attack scenario where an agent is manipulated into calling its tools in unintended or harmful ways.

Agent HijackingTaking over the goal or decision-making of an AI agent by manipulating its inputs or memory.

Related Resources

Original Abstract (Expand)

This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.