Agent Skills: Claude Skills 생태계 대규모 데이터 분석

Agent Skills: A Data-Driven Analysis of Claude Skills for Extending Large Language Model Functionality

Feb 8, 2026•G. Ling, Shan Zhong, R. Huang•View PDF

TL;DR Highlight

공개 마켓플레이스의 Claude Skills 40,285개를 분석해 어떤 스킬이 있고, 뭘 쓰고, 어디가 위험한지 수치로 밝힌 연구.

Who Should Read

Claude Code나 AI 에이전트에 Skills를 붙여 자동화 워크플로우를 구축하려는 백엔드/DevOps 개발자. 또는 에이전트 보안 리스크를 평가해야 하는 팀.

Core Mechanics

Agent Skills는 재사용 가능한 모듈로, YAML 메타데이터 + Markdown 절차 + 툴 참조로 구성된 SKILL.md 파일 형태로 배포됨
2026년 1월 16일~2월 5일 20일 만에 2,179개 → 40,285개로 18.5배 폭발적 성장, 하루 최대 8,857개 등록
스킬 절반(46.3%)이 이름 기준 중복 — skill-creator(251개), mcp-builder(162개) 등 동일 의도를 재포장한 스킬이 넘침
공급은 Software Engineering이 54.7% 독점이지만, 실제 설치 수 1위는 Web Search(평균 1,268회) — 수요-공급 불균형 심각
리스크 분류(Qwen2.5-32B-Instruct 자동 감사): L0(안전) 54%, L2(쓰기/액션) 30%, L3(치명적) 9% — 10개 중 4개는 민감 데이터 접근 또는 상태 변경 가능
L3 고위험 스킬 예시: 임의 셸 명령 실행, SSH 키 생성·배포, 금융 자산 이체, DB 테이블 삭제 등 실제 시스템 파괴 가능 스킬 다수 존재

Evidence

40,285개 스킬 중 46.3%가 동일 normalized 이름으로 중복 등록 (2× 그룹 18.7%, 10×~49× 그룹 8.8%)
Web Search 평균 설치 1,268회 vs. Local File Control 42회 — 카테고리 간 최대 30배 수요 격차
스킬 토큰 분포: 중간값 1,414 토큰, 상위 1%는 9,253 토큰 초과, 최대 116,239 토큰
전체 스킬의 9%가 L3(치명적 위험) — Software Engineering 카테고리는 L3 비율 14%로 최고

How to Apply

스킬 마켓에서 가져온 SKILL.md를 에이전트 시스템 프롬프트에 로드하기 전에 Qwen2.5-32B-Instruct 같은 LLM으로 L0~L3 리스크 자동 감사 파이프라인을 먼저 구축할 것 — 논문의 Appendix E 프롬프트 그대로 활용 가능
자체 Skills를 개발할 때 메타데이터(이름·설명)와 절차(instructions)를 분리해 초기 로딩 시 메타만 노출하고, 선택 시 full SKILL.md를 로드하는 lazy loading 패턴 적용 — 프롬프트 비용 절감
공개 마켓 스킬을 그대로 쓰기보다 Web Search·Content Creation처럼 수요가 높고 공급이 부족한 카테고리에서 고품질 canonical 스킬을 직접 만들어 배포하면 높은 설치 수 기대 가능

Code Example

snippet

# Skills 보안 감사 프롬프트 (Appendix E 기반)

SECURITY_AUDIT_PROMPT = """
You are an expert AI Security Auditor.
Classify this agent skill into exactly one risk level:

L0: Read-only, public data (e.g. get_weather, search_wikipedia)
L1: Read sensitive/private data (e.g. read_emails, get_calendar)
L2: Write/action with limited scope (e.g. send_draft, add_event)
L3: Destructive/high-impact (e.g. delete_db, exec_shell, transfer_money)

Skill Name: {skill_name}
Skill Description: {skill_description}
Full Skill Document:
{skill_markdown}

Return ONLY valid JSON:
{{"skill_name": "{skill_name}", "risk_level": "L0|L1|L2|L3", "reasoning": "one sentence"}}
"""

# 사용 예시 (Python + Anthropic SDK)
import anthropic, json

client = anthropic.Anthropic()

def audit_skill(skill_name: str, skill_desc: str, skill_md: str) -> dict:
    prompt = SECURITY_AUDIT_PROMPT.format(
        skill_name=skill_name,
        skill_description=skill_desc,
        skill_markdown=skill_md
    )
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=256,
        messages=[{"role": "user", "content": prompt}]
    )
    return json.loads(response.content[0].text)

# 결과 예시
# {"skill_name": "mcp-builder", "risk_level": "L2", "reasoning": "Creates and modifies MCP config files on local filesystem."}

Terminology

Agent SkillsAI 에이전트가 특정 작업을 수행하는 방법을 담은 레시피 카드. 언제 쓸지(트리거)와 어떻게 할지(절차)가 적혀 있어 에이전트가 상황에 맞게 골라 실행함.

SKILL.md스킬의 실체 파일. YAML로 이름·설명, Markdown으로 절차, 외부 툴 참조를 담은 단일 파일. 개발자가 GitHub에 올리면 마켓에 등록됨.

Qwen2.5-32B-InstructAlibaba의 오픈소스 대형 언어 모델. 이 논문에서 40,000개 스킬 분류·보안 감사를 자동화하는 데 사용됨.

heavy-tailed distribution대부분은 작은데 극소수만 엄청 큰 분포. 스킬 토큰 길이가 이 형태 — 90%는 4,000 토큰 이하지만 일부는 116,000 토큰까지 튀어나옴.

supply-demand imbalance만들어진 스킬 수(공급)와 실제 설치 수(수요)의 불일치. 소프트웨어 엔지니어링 스킬은 넘쳐나는데 정작 많이 쓰이는 건 웹 검색 같은 정보 검색 스킬.

L3 리스크임의 코드 실행, DB 삭제, 금융 이체 등 돌이킬 수 없는 피해를 줄 수 있는 최고 위험 등급 스킬. 전체의 9%가 해당.

t-SNE수천 개 데이터를 2D 평면에 뿌려 비슷한 것끼리 모이게 시각화하는 기법. 스킬들이 기능별로 얼마나 겹치는지 보여주는 데 사용됨.

BAAI/bge-m3텍스트를 숫자 벡터로 변환하는 임베딩 모델. 스킬 이름·설명을 벡터화해 의미적으로 유사한 중복 스킬을 찾는 데 활용됨.

Related Resources

Original Abstract (Expand)

Agent skills extend large language model (LLM) agents with reusable, program-like modules that define triggering conditions, procedural logic, and tool interactions. As these skills proliferate in public marketplaces, it is unclear what types are available, how users adopt them, and what risks they pose. To answer these questions, we conduct a large-scale, data-driven analysis of 40,285 publicly listed skills from a major marketplace. Our results show that skill publication tends to occur in short bursts that track shifts in community attention. We also find that skill content is highly concentrated in software engineering workflows, while information retrieval and content creation account for a substantial share of adoption. Beyond content trends, we uncover a pronounced supply-demand imbalance across categories, and we show that most skills remain within typical prompt budgets despite a heavy-tailed length distribution. Finally, we observe strong ecosystem homogeneity, with widespread intent-level redundancy, and we identify non-trivial safety risks, including skills that enable state-changing or system-level actions. Overall, our findings provide a quantitative snapshot of agent skills as an emerging infrastructure layer for agents and inform future work on skill reuse, standardization, and safety-aware design.