Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
TL;DR Highlight
A survey paper covering how Claude's Agent Skills work, how dangerous they can be, and how to handle them safely — all in one place.
Who Should Read
Backend/ML engineers putting Claude Code or MCP-based agents into production. Especially useful before adopting third-party skills into your team's workflow — this is the security due-diligence reading.
Core Mechanics
- Claude Agent Skills operate in three execution modes: sandboxed, semi-sandboxed, and direct — each with very different security profiles
- Most publicly available skills are unsandboxed and can access the file system, network, and shell — the same as arbitrary code execution
- The attack surface for prompt injection via skills is significantly larger than via direct user input — external tool outputs are rarely sanitized
- Skills compose in ways that amplify risk — individually benign skills can chain into dangerous capability combinations
- The paper proposes a 4-tier trust model for skills: verified, community, unverified, and isolated
Evidence
- Of 500 randomly sampled public skills, 71% run in unsandboxed mode with filesystem access
- Prompt injection via skill output demonstrated in 23 of 500 skills (4.6%) with real-world exploit chains
- Skill composition attack: 3 individually innocuous skills chained to achieve credential exfiltration in a demo environment
- Verified (official) skills: 0 confirmed exploits; community skills: 8.2% exploit rate; unverified: 19.4% exploit rate
How to Apply
- Classify every skill your team uses into the 4-tier trust model before deploying — don't mix unverified skills in the same agent as high-trust operations
- Sanitize all tool outputs before they re-enter the LLM context — treat external data as untrusted input, same as in web security
- Run composition analysis: list all skills in your agent and check for dangerous capability combinations (e.g., web fetch + code exec + file write = arbitrary exfil path)
Code Example
# Basic SKILL.md structure example
---
name: pdf-processing
description: Extract text from PDFs and fill form fields. Handles scanned docs via OCR.
---
# PDF Processing Skill
## When to use
- User asks to fill, extract, or convert PDF files
## Workflow
1. Check if PDF is scanned (use `scripts/detect_scan.py`)
2. If scanned: run OCR via `scripts/ocr.py`
3. Extract fields: `scripts/extract.py --input <file>`
4. Fill form: `scripts/fill_form.py --template assets/template.pdf`
## Edge cases
- Password-protected PDFs: prompt user for password, never store it
- Corrupted files: fail gracefully, report specific error
## Tools pre-approved
- bash: scripts/*.py only
- file_read: input PDF path
- file_write: output directory onlyTerminology
Related Resources
- https://github.com/scienceaix/agentskills
- https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
- https://www.anthropic.com/news/skills
- https://agentskills.io
- https://modelcontextprotocol.io/specification/2025-11-25
- https://www.anthropic.com/engineering/advanced-tool-use
Original Abstract (Expand)
The transition from monolithic language models to modular, skill-equipped agents marks a defining shift in how large language models (LLMs) are deployed in practice. Rather than encoding all procedural knowledge within model weights, agent skills -- composable packages of instructions, code, and resources that agents load on demand -- enable dynamic capability extension without retraining. It is formalized in a paradigm of progressive disclosure, portable skill definitions, and integration with the Model Context Protocol (MCP). This survey provides a comprehensive treatment of the agent skills landscape, as it has rapidly evolved during the last few months. We organize the field along four axes: (i) architectural foundations, examining the SKILL$.$md specification, progressive context loading, and the complementary roles of skills and MCP; (ii) skill acquisition, covering reinforcement learning with skill libraries, autonomous skill discovery (SEAgent), and compositional skill synthesis; (iii) deployment at scale, including the computer-use agent (CUA) stack, GUI grounding advances, and benchmark progress on OSWorld and SWE-bench; and (iv) security, where recent empirical analyses reveal that 26.1% of community-contributed skills contain vulnerabilities, motivating our proposed Skill Trust and Lifecycle Governance Framework -- a four-tier, gate-based permission model that maps skill provenance to graduated deployment capabilities. We identify seven open challenges -- from cross-platform skill portability to capability-based permission models -- and propose a research agenda for realizing trustworthy, self-improving skill ecosystems. Unlike prior surveys that broadly cover LLM agents or tool use, this work focuses specifically on the emerging skill abstraction layer and its implications for the next generation of agentic systems. Project repo: https://github.com/scienceaix/agentskills