This new technique saves 60% of my token expenses
TL;DR Highlight
You can reduce LLM response tokens by 60% by using a telegraphic style that only keeps nouns and verbs, excluding articles, conjunctions, and auxiliary verbs.
Who Should Read
Backend developers who are concerned about API costs and token optimization. Especially those using GPT-4 level models for simple tasks such as summarization, classification, and data extraction.
Core Mechanics
- When a typical response is hundreds of tokens, forcing a 'caveman' style compresses it to around 40 tokens. It's possible to convey the same meaning with significantly fewer tokens.
- Key prompt pattern: 'Drop articles, conjunctions, filler words, copulas. Keep nouns, verbs, key modifiers only.' — Explicitly instruct to remove articles (a, the), conjunctions (and, but), and unnecessary verbs (is, are).
- This approach is similar to the structure of American Sign Language (ASL) or telegrams. It's a strategy to increase meaning density and remove padding words.
- However, this technique is only valid for pipelines where 'readable responses' are not required. It's not suitable for responses exposed to end-users.
- It's also pointed out that 80% of prompts can be handled without expensive models (GPT-4, Claude Opus). Model downgrading (routing) may be a more fundamental cost reduction than compression style.
- Synergy can be achieved by combining a routing strategy to smaller models (GPT-4o mini, Haiku, etc.) with a compression style.
Evidence
- "Reported a 60% reduction in token count compared to normal responses. Presented a case where a hundreds-of-tokens response was compressed to around 40 tokens. Since costs are calculated based on the sum of input and output tokens, reducing output tokens by 60% proportionally reduces API costs. The effect is greater when the output proportion is large."
How to Apply
- If the response is not directly read by humans in internal pipelines (classification, extraction, summarization, etc.), add a telegraphic style instruction to the system prompt. Example: 'Respond in compressed telegraphic style. Drop articles, conjunctions, filler words, copulas. Keep nouns, verbs, key modifiers only.'
- Create a router that first determines the complexity of the task, sending simple classification/summarization to GPT-4o mini or Claude Haiku, and sending only complex reasoning to expensive models. Adding a compression style on top of this can provide double savings.
- If response parsing is required, use JSON mode or structured output along with the telegraphic style to structure the response, reducing tokens without parsing errors.
Code Example
system_prompt = """
Respond in compressed telegraphic style.
Drop articles, conjunctions, filler words, copulas.
Keep nouns, verbs, key modifiers only.
Meaning density over readability.
Write like a telegram costs per word.
"""
# Example input
user_message = "What are the main causes of climate change?"
# Normal response example (~80 tokens)
# "Climate change is primarily caused by the burning of fossil fuels, which releases greenhouse gases..."
# Telegraphic response example (~20 tokens)
# "Fossil fuel burning → CO2 rise → heat trap. Also: deforestation, agriculture, industry emissions."Terminology
Related Papers
Using Claude Code: The unreasonable effectiveness of HTML
Claude Code 팀이 Markdown 대신 HTML을 LLM 출력 포맷으로 선호하기 시작한 이유와 그 실용적 장점을 정리한 글로, AI와 함께 문서/스펙/대시보드를 만드는 워크플로우에 직접적인 영향을 준다.
When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling
Disagreement-guided routing boosts LLM accuracy on math and code by 3-7% with adaptive problem solving.
Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application
Five failure modes and eight practical solutions emerged after five days of running on-device SLMs (Gemma 4 E2B, Qwen3 0.6B) with Wordle.
Dynamic Context Evolution for Scalable Synthetic Data Generation
A framework that completely eliminates duplication and repetition in large-scale synthetic data generation with LLMs using three mechanisms (VTS + Semantic Memory + Adaptive Prompt).
90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.
This is a workflow sharing post about how pre-organizing a codebase in Wiki format can reduce token usage per Claude session by more than 90% instead of directly exploring the codebase every time.