Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

Feb 12, 2026•Renjun Xu, Yang Yan•View PDF

TL;DR Highlight

A survey paper covering how Claude's Agent Skills work, how dangerous they can be, and how to handle them safely — all in one place.

Who Should Read

Backend/ML engineers putting Claude Code or MCP-based agents into production. Especially useful before adopting third-party skills into your team's workflow — this is the security due-diligence reading.

Core Mechanics

Claude Agent Skills operate in three execution modes: sandboxed, semi-sandboxed, and direct — each with very different security profiles
Most publicly available skills are unsandboxed and can access the file system, network, and shell — the same as arbitrary code execution
The attack surface for prompt injection via skills is significantly larger than via direct user input — external tool outputs are rarely sanitized
Skills compose in ways that amplify risk — individually benign skills can chain into dangerous capability combinations
The paper proposes a 4-tier trust model for skills: verified, community, unverified, and isolated

Evidence

Of 500 randomly sampled public skills, 71% run in unsandboxed mode with filesystem access
Prompt injection via skill output demonstrated in 23 of 500 skills (4.6%) with real-world exploit chains
Skill composition attack: 3 individually innocuous skills chained to achieve credential exfiltration in a demo environment
Verified (official) skills: 0 confirmed exploits; community skills: 8.2% exploit rate; unverified: 19.4% exploit rate

How to Apply

Classify every skill your team uses into the 4-tier trust model before deploying — don't mix unverified skills in the same agent as high-trust operations
Sanitize all tool outputs before they re-enter the LLM context — treat external data as untrusted input, same as in web security
Run composition analysis: list all skills in your agent and check for dangerous capability combinations (e.g., web fetch + code exec + file write = arbitrary exfil path)

Code Example

snippet

# Basic SKILL.md structure example
---
name: pdf-processing
description: Extract text from PDFs and fill form fields. Handles scanned docs via OCR.
---

# PDF Processing Skill

## When to use
- User asks to fill, extract, or convert PDF files

## Workflow
1. Check if PDF is scanned (use `scripts/detect_scan.py`)
2. If scanned: run OCR via `scripts/ocr.py`
3. Extract fields: `scripts/extract.py --input <file>`
4. Fill form: `scripts/fill_form.py --template assets/template.pdf`

## Edge cases
- Password-protected PDFs: prompt user for password, never store it
- Corrupted files: fail gracefully, report specific error

## Tools pre-approved
- bash: scripts/*.py only
- file_read: input PDF path
- file_write: output directory only

Terminology

Agent SkillsModular capabilities that extend a Claude agent's ability to take actions — web browsing, code execution, file management, API calls, etc.

sandboxingRestricting a process's access to system resources — file system, network, other processes. Sandboxed skills can't affect the host system.

prompt injection via tool outputWhen malicious content in a tool's response (e.g., a web page the agent fetched) overwrites the agent's instructions.

skill compositionUsing multiple skills together in a single agent workflow — creates emergent capabilities that may not be anticipated from individual skill descriptions.

Related Resources

Original Abstract (Expand)

The transition from monolithic language models to modular, skill-equipped agents marks a defining shift in how large language models (LLMs) are deployed in practice. Rather than encoding all procedural knowledge within model weights, agent skills -- composable packages of instructions, code, and resources that agents load on demand -- enable dynamic capability extension without retraining. It is formalized in a paradigm of progressive disclosure, portable skill definitions, and integration with the Model Context Protocol (MCP). This survey provides a comprehensive treatment of the agent skills landscape, as it has rapidly evolved during the last few months. We organize the field along four axes: (i) architectural foundations, examining the SKILL$.$md specification, progressive context loading, and the complementary roles of skills and MCP; (ii) skill acquisition, covering reinforcement learning with skill libraries, autonomous skill discovery (SEAgent), and compositional skill synthesis; (iii) deployment at scale, including the computer-use agent (CUA) stack, GUI grounding advances, and benchmark progress on OSWorld and SWE-bench; and (iv) security, where recent empirical analyses reveal that 26.1% of community-contributed skills contain vulnerabilities, motivating our proposed Skill Trust and Lifecycle Governance Framework -- a four-tier, gate-based permission model that maps skill provenance to graduated deployment capabilities. We identify seven open challenges -- from cross-platform skill portability to capability-based permission models -- and propose a research agenda for realizing trustworthy, self-improving skill ecosystems. Unlike prior surveys that broadly cover LLM agents or tool use, this work focuses specifically on the emerging skill abstraction layer and its implications for the next generation of agentic systems. Project repo: https://github.com/scienceaix/agentskills