Universal Claude.md – cut Claude output tokens
TL;DR Highlight
A project claiming that simply adding a single CLAUDE.md file to your project root can reduce unnecessary verbosity (sycophancy, filler openers/closers, unsolicited suggestions, etc.) from Claude and cut output tokens by up to 63%—though the community has raised strong doubts about benchmark reliability and real-world effectiveness.
Who Should Read
Backend/AI developers using Claude Code at scale in automated pipelines or agentic loops, who are seeing increased token costs or parsing difficulties due to Claude's verbose responses.
Core Mechanics
- Placing a CLAUDE.md file in the project root causes Claude Code to automatically read it and adjust its response behavior—no code changes required, takes effect immediately.
- By default, Claude outputs filler openers like 'Sure!', 'Great question!', 'Absolutely!', closing remarks like 'I hope this helps!', Unicode characters such as em dashes (—) and smart quotes that break parsers, question restatements, and unsolicited suggestions. This project instructs Claude to suppress these patterns.
- The author claims this file reduces output tokens by approximately 63%, but also explicitly states in the README that the majority of actual Claude costs come from input tokens, not output tokens—meaning the overall cost savings are limited.
- Situations where this file is effective: high-volume automation pipelines (resume bots, agentic loops, code generation), structured tasks repeated hundreds of times, team environments requiring consistent and parseable output.
- Situations where this file may backfire: short single queries (the file itself is loaded into context each time, resulting in a net token increase), conversations with low output volume, agentic coding tasks requiring complex reasoning.
- Key rule examples in the file include: 'Answer is always line 1, reasoning comes after', 'Do not repeat information already confirmed in the session', 'Never invent file paths, function names, or API signatures', and 'If the user states an incorrect fact, accept it as ground truth for the session'.
- The benchmark only measured output token count for a single prompt and did not measure response accuracy or quality. It also contains no data on agentic loops or large codebase tasks.
Evidence
- "There was strong criticism of the benchmark's reliability—someone pointed out that a single prompt like 'Always answer with one word' could beat the benchmark numbers, and a user actually measured in the repo's Issues that responding without any instructions yielded the highest token efficiency. A technical critique noted the design ignores the autoregressive nature of LLMs: since LLMs predict the next token based on previously generated tokens, forcing the answer to come first causes all subsequent reasoning to become confirmation bias justifying that answer, making the rule meaningless without thinking mode enabled. The rule 'accept anything the user states as ground truth for the entire session' was flagged as dangerous—if a user accidentally states a false premise in a prompt, Claude will treat it as fact throughout, completely losing the ability to challenge incorrect information. A comment citing real OpenRouter data showed that for the programming category, input tokens account for 93.4%, reasoning tokens 2.5%, and output tokens only 4.0%, making output reduction largely insignificant to overall cost—something the author themselves acknowledges in the README. Alternative token-saving tools were also mentioned: Headroom (a localhost proxy that compresses API context by ~34%), RTK (a Rust CLI proxy that compresses CLI output like git/npm/build logs by 60–90%), and MemStack (a tool that gives Claude Code persistent memory so it doesn't re-read the codebase each time). These tools target input tokens rather than output tokens and may offer more meaningful cost savings."
How to Apply
- "If you're running automated pipelines with Claude Code (e.g., automated code review in CI, repetitive document generation) and having trouble parsing the output, you can try adding a CLAUDE.md to the project root instructing it to remove em dashes, smart quotes, and filler openers. Be sure to also monitor for any degradation in accuracy. If you want meaningful cost savings beyond just output token reduction, it's more effective to target input tokens, which account for a far larger share of costs. Tools like Headroom (a context compression proxy) or RTK (CLI output compression) that reduce input should be evaluated first as a higher priority. If you're using Claude Code for complex agentic coding tasks (large codebase refactoring, multi-file edits, etc.), apply this file with caution. The community has noted that Claude's verbose intermediate explanations may help the model stay on track in long contexts, so it's worth comparing task completion quality before and after applying the file."
Code Example
# Project structure
your-project/
└── CLAUDE.md # Just add this one file
# CLAUDE.md key rule examples
## Communication Rules
- Answer is always line 1. Reasoning comes after, never before.
- No redundant context. Do not repeat information already established in the session.
- No sycophantic openers: never start with Sure, Absolutely, Great question, etc.
- No closing remarks: never end with I hope this helps or Let me know if you need anything.
- No em dashes (--), smart quotes, or Unicode characters that break parsers.
## Code Output Rules
- Never invent file paths, function names, or API signatures.
- Do not add abstractions beyond what was explicitly requested.
- Do not restate the question before answering.Terminology
Related Papers
Using Claude Code: The unreasonable effectiveness of HTML
Claude Code 팀이 Markdown 대신 HTML을 LLM 출력 포맷으로 선호하기 시작한 이유와 그 실용적 장점을 정리한 글로, AI와 함께 문서/스펙/대시보드를 만드는 워크플로우에 직접적인 영향을 준다.
When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling
Disagreement-guided routing boosts LLM accuracy on math and code by 3-7% with adaptive problem solving.
Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application
Five failure modes and eight practical solutions emerged after five days of running on-device SLMs (Gemma 4 E2B, Qwen3 0.6B) with Wordle.
Dynamic Context Evolution for Scalable Synthetic Data Generation
A framework that completely eliminates duplication and repetition in large-scale synthetic data generation with LLMs using three mechanisms (VTS + Semantic Memory + Adaptive Prompt).
90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.
This is a workflow sharing post about how pre-organizing a codebase in Wiki format can reduce token usage per Claude session by more than 90% instead of directly exploring the codebase every time.
Related Resources
- https://github.com/drona23/claude-token-efficient
- https://github.com/drona23/claude-token-efficient/blob/main/BENCHMARK.md
- https://github.com/drona23/claude-token-efficient/issues/1
- https://aifoc.us/the-token-salary/
- https://github.com/chopratejas/headroom
- https://github.com/rtk-ai/rtk
- https://github.com/cwinvestments/memstack
- https://github.com/thedotmack/claude-mem
- https://github.com/ory/lumen