Taught Claude to talk like a caveman to use 75% less tokens.
TL;DR Highlight
This post details a prompt technique that drastically compresses Claude's response style, reducing token usage by 75%, which could be useful for developers interested in reducing API costs.
Who Should Read
Developers who want to reduce token costs while using the Claude API, or developers operating chatbots/automation pipelines that require optimized response length.
Core Mechanics
- The title claims that you can reduce the number of tokens in Claude's response by up to 75% by instructing it to 'speak like a caveman'.
- Access to the original page is blocked, so the specific prompt content or methodology cannot be confirmed, but it is presumed to remove unnecessary modifiers, complete sentence structures, and polite expressions.
- Token reduction techniques are directly linked to API costs, so substantial cost savings can be expected in production environments that handle a large volume of requests.
- Since the original content cannot be confirmed, detailed information such as the exact prompt, experimental conditions, and measurement methods are unknown.
Evidence
- "(No comment information)"
How to Apply
- Add the instruction 'Respond as briefly and concisely as possible, focusing only on keywords' to the system prompt in your Claude API pipeline, and measure the changes in response quality and token count.
- In pipelines that are not directly read by humans, such as internal tools or automation scripts, it is acceptable to drastically compress the response style, so apply token reduction experiments first in these environments.
Code Example
// System prompt example (compression instruction for token reduction)
const systemPrompt = `
Respond in minimal words. No full sentences. No pleasantries.
Keywords only. Like caveman speech.
Example: Instead of 'The answer to your question is yes, you should use X',
just say: 'yes. use X.'
`;Terminology
Related Papers
Claude-real-video - any LLM can watch a video
YouTube URL이나 로컬 영상 파일에서 장면 변화 기반으로 핵심 프레임만 추출하고 음성 전사까지 해서 LLM에게 넘겨주는 오픈소스 도구. Claude는 영상 파일을 못 받고, ChatGPT는 자막만 읽고, Gemini는 고정 1fps 샘플링이라는 한계를 모두 우회한다.
ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning
128K 토큰 컨텍스트에서 모델 내부 attention 신호로 핵심 증거만 추출해 재주입하면 추론 정확도가 24.6% 오른다.
Single and Multi Truth Data Fusion using Large Language Models
여러 소스의 충돌하는 데이터를 GPT-4o-mini 프롬프트로 병합하면 기존 비지도 방법보다 일관되게 F1 점수가 높다.
Multilingual Reasoning Cascades Need More Context
번역 cascade 파이프라인에서 원본 질문을 마지막까지 유지하면 추가 학습 없이 다국어 성능이 크게 오른다.
Less Back-and-Forth: A Comparative Study of Structured Prompting
체크리스트 형식으로 프롬프트를 구조화하면 LLM 답변 품질도 높아지고 토큰도 적게 쓴다.
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement
재학습 없이 각 나라의 도덕적 가치관에 맞게 LLM 출력을 조정하는 추론 시점 기법 DISCA 제안
Using Claude Code: The unreasonable effectiveness of HTML