HarnessAPI: Streaming API와 MCP 도구를 하나로 통합하는 Skill-First 프레임워크

TL;DR Highlight

FastAPI HTTP 엔드포인트와 MCP 도구를 하나의 폴더에서 자동으로 동시에 만들어주는 Python 프레임워크

Who Should Read

Claude, Cursor 같은 AI 에이전트에 도구를 제공하면서 동시에 REST API도 유지해야 하는 백엔드 개발자. FastAPI로 HTTP 서버 따로, MCP 서버 따로 관리하는 이중 유지보수에 지친 팀.

Core Mechanics

기존에는 같은 비즈니스 로직을 FastAPI HTTP 엔드포인트와 FastMCP 도구 등록, 두 군데에 따로 써야 했음. HarnessAPI는 handler.py + models.py 하나짜리 skill 폴더에서 둘 다 자동 생성.
SSE(Server-Sent Events, 실시간 스트리밍 프로토콜) 스트리밍과 JSON 응답을 같은 엔드포인트에서 처리. 클라이언트가 Accept: application/json 헤더를 보내면 JSON, 아니면 SSE로 자동 분기.
FastMCP의 타입 어노테이션 해석 버그 우회를 위해 exec()로 MCP 래퍼 함수를 동적으로 컴파일하는 기법 사용. Pydantic 모델을 함수의 __globals__에 직접 넣어서 해결.
is_mcp = false 플래그 하나로 특정 skill을 MCP 레이어에서 숨기고 HTTP는 유지 가능. 관리용 내부 API를 에이전트 도구 목록에서 제외할 때 유용.
harnessapi init --function f.py 명령어로 기존 Python 함수를 AST 파싱해서 Pydantic 모델과 skill 폴더를 자동 생성. 기존 코드베이스 마이그레이션 진입장벽 없음.
enable_edit_endpoints=True 옵션으로 AI 코딩 에이전트가 서버 재시작 없이 핸들러 코드를 hot-swap 가능. 단, localhost에서만 작동하고 외부 노출 시 RuntimeError 발생.

Evidence

6개 skill 기준 프레임워크 보일러플레이트 코드가 수동 dual-stack(FastAPI 103줄 + FastMCP 67줄 = 170줄) 대비 HarnessAPI 44줄로 74% 감소.
수동 방식은 skill 수에 비례해 O(n)으로 코드가 늘어나지만, HarnessAPI는 O(1) 고정. 10개 skill 기준 수동은 약 283줄, HarnessAPI는 여전히 44줄 수준.
agentskills.io 포맷의 skill 폴더 12개를 harnessapi init --skills-dir로 임포트했을 때 소스 변경 없이 전부 성공. SKILL.md, skill.toml 메타데이터 우선순위도 정확히 적용.

How to Apply

기존에 FastAPI + FastMCP를 따로 운영 중이라면 skill 폴더 구조(handler.py + models.py)로 리팩토링 후 pip install harnessapi로 단일 프로세스로 통합. 포트 2개 관리, 헬스체크 2개 운영 부담이 사라짐.
Claude Desktop이나 Cursor에 도구를 등록하면서 동시에 웹 대시보드용 HTTP API도 제공해야 하는 경우, skill.toml에 is_mcp = true/false만 설정하면 노출 범위를 독립적으로 제어 가능.
AI 코딩 에이전트(예: Cursor Agent)가 로컬 개발 중 핸들러 로직을 반복 수정해야 할 때, enable_edit_endpoints=True로 서버 켜두고 에이전트가 POST /skills/{name}/edit로 코드 교체하며 빠르게 이터레이션 가능.

Code Example

snippet

# 설치
pip install harnessapi

# skill 폴더 구조 생성
harnessapi init my-project

# skills/summarize/models.py
from harnessapi import SkillInput, SkillOutput

class Input(SkillInput):
    text: str
    max_length: int = 100

class Output(SkillOutput):
    summary: str

# skills/summarize/handler.py
from .models import Input, Output

async def handle(input: Input) -> Output:
    return Output(summary=input.text[:input.max_length])

# skills/summarize/skill.toml
[skill]
description = "텍스트를 요약합니다"
is_mcp = true
timeout_secs = 30

# main.py
from harnessapi import HarnessAPI

app = HarnessAPI(skills_dir="./skills")

# 실행: uvicorn main:app --reload
# → POST /skills/summarize (HTTP + SSE)
# → /mcp (MCP 도구 자동 등록)
# → /docs (Swagger UI 자동 생성)

Terminology

MCPModel Context Protocol. AI 에이전트가 외부 도구를 호출하는 표준 프로토콜. Claude, Cursor, GitHub Copilot 같은 도구들이 이 규격으로 외부 함수를 찾고 실행함.

SSEServer-Sent Events. 서버가 클라이언트에게 데이터를 실시간으로 밀어주는 단방향 스트리밍 방식. LLM 응답을 타이핑하듯 스트리밍할 때 주로 사용.

FastMCPMCP 도구를 Python에서 쉽게 만들 수 있게 해주는 라이브러리. FastAPI처럼 데코레이터로 함수를 MCP 도구로 등록함.

PydanticPython에서 데이터 타입을 검증하는 라이브러리. 함수 입출력 스키마를 정의하면 자동으로 유효성 검사와 JSON 직렬화를 해줌.

ASGIPython 비동기 웹 서버 인터페이스 표준. FastAPI, Uvicorn이 이 규격을 따름. 동시 요청을 효율적으로 처리하기 위한 규격.

skill-firstHTTP 라우트나 MCP 도구 등록이 아니라 '기능(skill)' 자체를 중심에 두는 설계 패턴. 하나의 skill 정의에서 여러 프로토콜 표현이 자동으로 파생됨.

hot-swap서버를 재시작하지 않고 실행 중에 코드를 교체하는 기법. 개발 중 빠른 이터레이션을 위해 사용하지만 보안 위험이 있어 로컬에서만 사용해야 함.

Related Resources

Original Abstract (Expand)

Every Python function deployed as an LLM tool must today exist in two forms: an HTTP endpoint for human-facing clients and CI pipelines, and an MCP tool registration for agent runtimes such as Claude and Cursor. These representations share business logic yet diverge in all the surrounding machinery (routing, validation, serialisation, streaming, and schema maintenance), and they drift apart as the underlying code evolves. We present HarnessAPI, a Python framework that eliminates this duplication by treating a typed skill folder as the single source of truth. From one handler.py plus Pydantic schemas, the framework automatically derives a streaming HTTP endpoint with Server-Sent Events, an interactive OpenAPI/Swagger UI, and a zero-configuration MCP tool, all served from a single process. Dual-mode content negotiation lets the same handler serve SSE-streaming and JSON-returning clients with no handler changes. A dynamic code-generation mechanism ensures Pydantic type annotations propagate correctly to FastMCP's inspection layer, resolving a technical limitation that prevents naive closure-based registration. Measured across six representative skills using cloc, HarnessAPI reduces framework-facing boilerplate by 74% compared with a manually maintained dual-stack implementation (FastAPI server + FastMCP server). HarnessAPI subclasses FastAPI, inheriting its full middleware, dependency-injection, and deployment ecosystem. It is available at https://github.com/edwinjosechittilappilly/harnessapi and on PyPI (pip install harnessapi)