Taught Claude to talk like a caveman to use 75% less tokens.
TL;DR Highlight
This post details a prompt technique that drastically compresses Claude's response style, reducing token usage by 75%, which could be useful for developers interested in reducing API costs.
Who Should Read
Developers who want to reduce token costs while using the Claude API, or developers operating chatbots/automation pipelines that require optimized response length.
Core Mechanics
- The title claims that you can reduce the number of tokens in Claude's response by up to 75% by instructing it to 'speak like a caveman'.
- Access to the original page is blocked, so the specific prompt content or methodology cannot be confirmed, but it is presumed to remove unnecessary modifiers, complete sentence structures, and polite expressions.
- Token reduction techniques are directly linked to API costs, so substantial cost savings can be expected in production environments that handle a large volume of requests.
- Since the original content cannot be confirmed, detailed information such as the exact prompt, experimental conditions, and measurement methods are unknown.
Evidence
- "(No comment information)"
How to Apply
- Add the instruction 'Respond as briefly and concisely as possible, focusing only on keywords' to the system prompt in your Claude API pipeline, and measure the changes in response quality and token count.
- In pipelines that are not directly read by humans, such as internal tools or automation scripts, it is acceptable to drastically compress the response style, so apply token reduction experiments first in these environments.
Code Example
// System prompt example (compression instruction for token reduction)
const systemPrompt = `
Respond in minimal words. No full sentences. No pleasantries.
Keywords only. Like caveman speech.
Example: Instead of 'The answer to your question is yes, you should use X',
just say: 'yes. use X.'
`;Terminology
Related Papers
Using Claude Code: The unreasonable effectiveness of HTML
Claude Code 팀이 Markdown 대신 HTML을 LLM 출력 포맷으로 선호하기 시작한 이유와 그 실용적 장점을 정리한 글로, AI와 함께 문서/스펙/대시보드를 만드는 워크플로우에 직접적인 영향을 준다.
When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling
Disagreement-guided routing boosts LLM accuracy on math and code by 3-7% with adaptive problem solving.
Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application
Five failure modes and eight practical solutions emerged after five days of running on-device SLMs (Gemma 4 E2B, Qwen3 0.6B) with Wordle.
Dynamic Context Evolution for Scalable Synthetic Data Generation
A framework that completely eliminates duplication and repetition in large-scale synthetic data generation with LLMs using three mechanisms (VTS + Semantic Memory + Adaptive Prompt).
90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.
This is a workflow sharing post about how pre-organizing a codebase in Wiki format can reduce token usage per Claude session by more than 90% instead of directly exploring the codebase every time.