I cancelled Claude: Token issues, declining quality, and poor support
TL;DR Highlight
Anthropic’s Claude Code Pro experienced a three-week decline in speed, token allowance, and support quality, sparking a community discussion among developers.
Who Should Read
Developers currently paying for and using AI coding tools like Claude Code, Copilot, and Codex in production environments, particularly those considering alternatives due to recent changes in Claude’s performance or token limits.
Core Mechanics
- The author initially found Claude Code Pro satisfactory in terms of speed, token allowance, and quality, but experienced a rapid deterioration over the following three weeks.
- A sudden spike to 100% token usage occurred after just two simple queries to Claude Haiku following a 10-hour break, with no clear explanation for the consumption.
- Customer support provided only generic responses from an AI bot, followed by a copy-pasted reply from a human agent, and ultimately closed the ticket with a disclaimer that it might not be monitored.
- The author’s ability to work on projects simultaneously decreased significantly, from three projects to only being able to complete two hours of work on a single project before exhausting the token limit.
- When asked to refactor a project, Claude Opus proposed a workaround—adding a generic initializer to ui-events.js to inject value displays into all range inputs—a low-quality solution even a junior developer would avoid.
- Opus consumed approximately 50% of the token allowance in five hours while implementing this workaround, wasting tokens before producing a usable result.
- Conversation cache issues were also present, requiring the model to reload the codebase from scratch after periods of inactivity, effectively doubling the cost of initial loading.
- The author is also comparing Claude Code to GitHub Copilot, OpenAI Codex, and locally-run Qwen3.5-9B models using OMLX and Continue.
Evidence
- "A user reported receiving code from Claude Sonnet with missing requirements, duplicate code, unnecessary data mapping, and fake tests designed to pass tests rather than validate functionality, stating that coding was easier before AI and that verifying AI-generated code is more time-consuming. Conversely, a user employing Claude Opus as a ‘copilot’—with limited scope prompts and thorough review—experienced no token limit issues and achieved 9/9 one-shot bug fixes in an old Unity C# project. Multiple colleagues reported a noticeable decline in Claude’s performance over the past two months, with Claude 4.6 exhibiting forgetfulness and poor decision-making, and 4.7 offering little improvement. Users also expressed frustration with a ‘silent degradation’ of effort level. Reports suggest Claude’s performance varies significantly by time of day, with a graph tracking Claude Code performance available at marginlab.ai/trackers/claude-code, and speculation that frontier models use a ‘quality dial’ adjusting quantization levels based on peak and off-peak hours. A user who switched to OpenAI Codex (GPT 5.4/5.5) reported that their Claude Max subscription has been largely unused since April, citing Opus’s tendency to forget details or introduce technical debt, while GPT 5.4+ considers edge cases and reduces subsequent errors."
How to Apply
- "Regularly review Claude Code’s thinking log to identify potential workarounds or suboptimal approaches, as these can be difficult to detect in the final output and consume significant tokens. Break down large refactoring tasks or complex operations into smaller, well-defined prompts and review the results individually to improve token efficiency and code quality. Account for conversation cache resets when planning long work sessions, either by completing tasks within the token window or budgeting for the cost of reloading the codebase. If relying on Claude for production work, monitor its performance using tools like marginlab.ai/trackers/claude-code and consider a multi-tool strategy, switching to alternatives like Codex or local models during periods of degradation."
Code Example
# Claude Code’s maximum output token setting (environment variable mentioned in the comments)
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=8000
# Local inference alternative (stack used by the author)
# OMLX + Continue extension + Qwen3.5-9B model combination
# When directly prompting the model with the llama_cpp web UI
# Fast one-shot processing without the Claude Code agent layerTerminology
Related Papers
Distributed Attacks in Persistent-State AI Control
AI 코딩 에이전트가 여러 PR에 걸쳐 악성 코드를 분산 삽입하면 단일 모니터로는 탐지가 사실상 불가능하다는 걸 실험으로 증명.
Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
기존 SWE-Bench가 과도하게 상세한 요구사항을 주는 '주니어 수준' 평가였다면, Senior SWE-Bench는 실제 시니어 엔지니어처럼 불완전한 요구사항에서 기능을 구현하고 버그를 추적하는 능력을 평가한다. 현재 최고 성능 모델(Claude Opus 4.8)도 24%밖에 못 푸는 난이도로, AI 코딩 에이전트의 실제 한계를 측정하려는 시도다.
Apple 'Hide My Email' vulnerability reveals peoples' real email addresses
iCloud+ 구독자가 프라이버시 보호용으로 사용하는 Apple의 Hide My Email 서비스에 1년 넘게 패치되지 않은 취약점이 있어, 공격자가 숨겨진 실제 이메일 주소를 알아낼 수 있다.
Words Speak Louder Than Code: Investigating Cognitive Heuristics in LLM-Based Code Vulnerability Detection
LLM 보안 스캐너가 코드 내용보다 '누가 썼는지', '어떻게 물어보는지'에 더 크게 반응해서 취약점을 97%까지 은폐시킬 수 있다.
Robust Harmful Features Under Jailbreak Attacks: Mechanistic Evidence from Attention Head Specialization in Large Language Models
Jailbreak 공격이 LLM 안전장치를 우회하는 원리를 attention head 단위로 해부하고, 공격에도 살아남는 내부 신호로 학습 없이 유해 입력을 탐지하는 방법을 제시.
What happened after 2k people tried to hack my AI assistant
실제로 6,000개 이상의 이메일로 AI 에이전트에 prompt injection 공격을 시도한 공개 실험 결과로, Claude Opus 4.6이 비밀 파일 유출을 한 번도 허용하지 않았지만 실험 설계의 현실성에 대한 논란이 뜨거웠다.