Apideck CLI – An AI-agent interface with much lower context consumption than MCP | AI Paper Digest

TL;DR Highlight

MCP tool definitions alone can consume 55,000+ tokens of context bloat, and Apideck proposes a CLI-based agent interface that uses only ~80 tokens as an alternative.

Who Should Read

Backend and fullstack developers building AI agents or LLM-based automation systems who are experiencing or worried about context window exhaustion when integrating MCP servers.

Core Mechanics

A standard MCP server with many tools can consume 55,000+ tokens just for tool definitions — a significant chunk of most LLMs' context windows before any actual work begins.
Apideck's CLI-based approach encodes tool availability as a compact command-line interface description (~80 tokens) rather than full JSON schemas, letting the agent 'discover' what it needs on demand.
This lazy-loading approach means the agent fetches full tool details only when it decides to use a specific tool, keeping baseline context consumption near zero.
The tradeoff: the agent needs an extra round-trip to look up tool details before calling them, adding latency. But for long sessions with many available tools, the context savings far outweigh the latency cost.
The post argues that MCP's current design — front-loading all tool definitions — is fundamentally mismatched with context window economics and needs rethinking for large-scale tool ecosystems.

Evidence

Commenters verified the 55K token figure by measuring real MCP servers — one person checked an enterprise CRM MCP and found it exceeded 80K tokens for tool definitions alone.
Several developers noted they'd hit this problem in practice and resorted to workarounds like splitting tools across multiple MCP servers or selectively disabling tools.
The Apideck team shared benchmark data showing response quality was comparable between the full-schema and CLI approaches for common API tasks, with the CLI approach using ~680x fewer tokens for tool definitions.
Some skeptics argued that the CLI approach sacrifices type safety and discoverability — the LLM has less precise information about parameter formats, potentially increasing errors.

How to Apply

Audit your current MCP setup: run a token counter on all tool definitions. If you exceed 10K tokens just for definitions, you have a context bloat problem worth solving.
Group related tools and load only the relevant group for each task context. For example, 'database tools' vs 'API tools' vs 'file tools' as separate MCP servers.
Consider implementing a tool registry pattern: expose a 'list_tools' meta-tool that returns brief descriptions, then 'get_tool_schema' for details only when needed.
For the highest-frequency tools (the 20% you use 80% of the time), keep full schemas in context. For the long tail, use lazy-loading.

Terminology

Context BloatThe phenomenon where non-essential information (like tool definitions) consumes a large portion of the context window before the actual task even starts.

MCP (Model Context Protocol)Anthropic's open protocol for standardized connections between AI assistants and external tools/data sources.

Lazy LoadingA pattern where resource details are fetched only when actually needed, rather than loading everything upfront.

Context WindowThe maximum number of tokens an LLM can process in a single request — typically 128K to 1M tokens for modern models.

Apideck CLI – An AI-agent interface with much lower context consumption than MCP