I reverse-engineered why Claude Code burns through your usage so fast. 7 bugs that stack on top of each other — and the worst one activates when Extra Usage kicks in
TL;DR Highlight
A Max 20x subscriber reverse-engineered the Claude Code CLI source and discovered 7 bugs that drain usage abnormally fast. The core issue is a 'death spiral' where switching to Extra Usage demotes cache TTL from 1 hour to 5 minutes, causing costs to spike 2.8x.
Who Should Read
Developers using Claude Code CLI, especially Max plan or Extra Usage subscribers. Essential reading for anyone who has noticed their usage draining faster than usual.
Core Mechanics
- Most severe bug: an internal function in cli.js detects Extra Usage status and automatically demotes the cache TTL requested from the server from 1 hour to 5 minutes. Taking a break longer than 5 minutes raises the per-turn cost from $0.22 to $0.61 (2.8x) on a 220K context.
- This demotion happens client-side only. The server accepts 1-hour TTL requests normally, and patching the function in cli.js to always return true causes the server to grant 1 hour.
- The 'death spiral' structure: other bugs rapidly exhaust plan usage → Extra Usage activates → cache demoted to 5 minutes → one bathroom break triggers a full rebuild → Extra Usage drains instantly → locked out until the 5-hour reset.
- The native installer binary includes a custom Bun runtime that corrupts the cache prefix on every request. Installing via `npm install` resolves this; `file $(which claude)` should return a symlink, not an ELF binary.
- Between v2.1.69 and v2.1.90 (28 days, 20 versions), certain attachment types were missing on session resume, causing a cache miss every time. Fixed in v2.1.91.
- The Autocompact feature had no circuit breaker, causing failed compactions to retry infinitely. Internal source comments recorded 50+ consecutive failures across 1,279 sessions. Fixed in v2.1.89.
- The client generates fake rate limit errors without making actual API calls when transcripts grow long. Responses show `model: synthetic` with 0 tokens. Not yet fixed.
- Server-side compaction silently removes tool results mid-session, breaking the cache. Cannot be patched client-side. Not yet fixed.
Evidence
- A Max 20x user on WSL noticed usage draining noticeably faster recently; switching from the native install to the npm version brought drain speed back to normal, and they shared this experience directly.
- As a counterpoint, some heavy Max plan users claimed their usage had not changed at all compared to months ago and asked why it wasn't happening to everyone. The OP explained that bugs 1+3+5 must all occur simultaneously for the worst case of exhausting a weekly quota within 2 hours.
- One user had tried the native installer when it first launched, ran into issues, and switched back to the npm version — and had not experienced any of the recent issues since, as shared in the comments.
- A link to a cache analysis tool was shared in the comments: `https://github.com/abhiyan-maitri/claude-usage-report` — allows you to check cache usage per prompt.
- There was significant criticism that the post was written with Claude. Alongside cynical remarks that 'this sub has become a place where bots post for bots,' criticism also emerged that the Claude Code team no longer carefully reviews Claude-generated code, which is why these bugs went unaddressed for 20 releases.
- Some took a charitable view, noting that Anthropic is already subsidizing costs as much as possible, so fixing these bugs would benefit Anthropic as well.
How to Apply
- If you installed Claude Code via the native installer, reinstall with npm right now. If `file $(which claude)` returns an ELF binary, you have the native version. Switch with `npm install -g @anthropic-ai/claude-code` to resolve the cache prefix corruption bug (#1).
- If you are on a version below v2.1.91, update immediately. The session-resume cache miss bug (#2) and the Autocompact infinite loop bug (#3) were fixed in v2.1.91 and v2.1.89 respectively.
- If you are concerned about cost spikes from Extra Usage, you can patch the cache TTL decision function in cli.js to always return the 1-hour value. Note that updates will overwrite the patch, so you will need to reapply it after each update. Technical details are in GitHub issue anthropics/claude-code#43566.
Code Example
# Check current installation method
file $(which claude)
# Result 'ELF 64-bit...' → native binary (has bugs)
# Result 'symbolic link...' → npm version (normal)
# Switch to npm
npm install -g @anthropic-ai/claude-code
# Check cache settings
cat ~/.claude.json | grep -A5 'cachedGrowthBookFeatures'Terminology
Related Papers
Jamesob's guide to running SOTA LLMs locally
2천 달러짜리 RTX 3090 한 장부터 4만 달러짜리 RTX PRO 6000 4장 셋업까지, 로컬에서 최신 LLM을 직접 돌리는 방법을 하드웨어 선택·구성·실행 설정까지 통째로 정리한 실전 가이드다.
Faster embeddings: how we rebuilt the ONNX path in Manticore
Manticore Search가 기존 SentenceTransformers/Candle 백엔드를 ONNX Runtime으로 교체해 텍스트 임베딩 생성 속도를 평균 14배 향상시켰다. 별도 모델 서비스 없이 DB 내부에서 직접 임베딩을 처리하는 구조에서 INSERT 속도가 곧 임베딩 속도이기 때문에 이 개선은 실질적인 ingest 처리량 향상으로 직결된다.
Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction
멀티벡터 검색 모델의 문서 벡터를 1비트 이진값으로 압축하고 쿼리 벡터만 int8로 유지하는 비대칭 양자화 기법으로, 스토리지를 97% 줄이면서 검색 품질 손실을 0.61점(NDCG@10 기준)에 그치게 만든 실제 프로덕션 적용 사례다.
Show HN: Bash4LLM+ – A lightweight, dependency-free Bash wrapper for LLM APIs
Python이나 Node.js 없이 순수 Bash만으로 Groq 등 OpenAI 호환 LLM API를 호출할 수 있는 단일 스크립트 도구로, Termux(Android)를 포함한 모든 Unix 환경에서 동작한다.
Wayfinder Router: deterministic routing of queries between local and hosted LLM
프롬프트의 복잡도를 모델 호출 없이 오프라인으로 점수화해서 간단한 쿼리는 로컬 모델로, 어려운 쿼리는 유료 모델로 자동 라우팅하는 CLI 도구다. LLM 비용을 줄이면서도 응답 품질을 유지하고 싶은 개발자에게 유용하다.
Apple Neural Engine: Architecture, Programming, and Performance
Apple 기기에 내장된 AI 전용 칩인 ANE(Apple Neural Engine)를 리버스 엔지니어링으로 분석한 302페이지짜리 기술 문서로, Core ML 아래 숨겨진 내부 구조와 직접 접근 경로를 처음으로 공개한다.