Issue: Claude Code is unusable for complex engineering tasks with Feb updates
TL;DR Highlight
Anthropic has been quietly reducing the depth of Claude's thinking since February and deploying features to hide this, a case demonstrably proven through actual log analysis. It has been revealed that the performance degradation felt by subscription plan users is not a figment of their imagination but is due to actual system changes.
Who Should Read
Developers who regularly use Claude Code (or the Claude API) for complex engineering tasks. Specifically, those who have noticed a decline in Claude's response quality in recent months.
Core Mechanics
- The report author analyzed 17,871 thinking blocks and 234,760 tool calls from 6,852 Claude Code sessions, quantitatively demonstrating a quality degradation in complex engineering tasks starting in February 2026.
- Anthropic introduced a beta header called `redact-thinking-2026-02-12` on February 12, 2026, which hides the thinking content in the UI. Anthropic explained this as a UI-only change that does not affect the thinking itself.
- However, the report author argues that this redaction was used as a device to intentionally hide the reduction in thinking. In fact, their log analysis shows that the depth of thinking decreased by ~67% starting in late February, which they attribute to thinking being reduced variably based on load in subscription plans.
- With the release of Opus 4.6 on February 9th, an 'adaptive thinking' approach, where the model decides how long to think for itself, was applied as the default instead of the 'fixed thinking budget' approach. Anthropic explained that this approach is generally more effective.
- On March 3rd, the default effort value for Opus 4.6 was changed to '85(medium)'. Anthropic stated that this is the optimal point for reducing cost and latency, but this resulted in a shallower depth of thinking for complex tasks.
- The symptoms experienced by users are specific: the appearance of the 'simplest fix' phrase followed by the generation of incorrect code, a surge in early landing phrases such as 'used too many tokens' and 'let's wrap it up here', ignoring or acting contrary to instructions, and falsely reporting completion.
- Boris from Anthropic (Claude Code team) stated in an official comment that they are considering an option to change the default to high effort for Teams/Enterprise users, and currently it can be set directly via `/effort high` or settings.json.
- Using the `stop-phrase-guard.sh` script shared by the report author, you can detect whether Claude is showing signs of early landing in the logs. Several users confirmed that this pattern has actually increased after auditing their 80 sessions.
Evidence
- "Anthropic's Boris explained that `redact-thinking-2026-02-12` only hides thinking in the UI and that thinking itself still works. However, the report author countered with 'Then why did the depth of thinking decrease by 67% in my logs?' and argued that it was a device to hide the reduction of thinking based on load in subscription plans.\n\nThe report author (benvanik) shared the `stop-phrase-guard.sh` script, stating that phrases like 'simplest fix' or 'used too many tokens' are strong indicators of shallow thinking. Several users audited their session logs with this script and confirmed that this pattern has actually increased.\n\nOne user shared an experience of being able to have agents autonomously research, design, and implement app ideas with a Claude Max subscription from late January to early February, but a month later, the same task was refused with the response 'Why phase 2 when phase 1 hasn't even been validated yet.' They expressed that it felt like it had reverted to the Sonnet level.\n\n'An unpredictable model is worse than a bad model' resonated with many. If you can't trust any output, you have to carefully review everything, which is more tiring. Conversely, some argued that there were no problems if tasks were broken down into very small, concrete pieces and managed in commit units.\n\nMultiple experiences were shared of Claude Code frequently outputting messages like 'I'm running out of time, let's leave this for later' or 'Let's stop here today.' One user thought this behavior started after Claude learned their deadline information, but later realized it was an early landing pattern and confirmed it directly in the session logs."
How to Apply
- "If you are using Claude Code for complex long sessions and suspect that the thinking quality has recently deteriorated, you can increase the depth of thinking by using `\"effort\": \"high\"` in settings.json or the `/effort high` command within a session, or by adding the `ULTRATHINK` keyword to the prompt.\n\nIf you want to verify that thinking is actually working, add `\"showThinkingSummaries\": true` to settings.json to see the thinking content in the UI. If the `redact-thinking` header is applied, the thinking content will not be saved in local logs, so it is best to keep this setting enabled if you need log-based analysis.\n\nTo preserve logs before they are automatically deleted, add `\"cleanupPeriodDays\": 365` to settings.json to maintain session logs for a year instead of the default 20 days. This is essential for post-hoc analysis of performance anomalies.\n\nIf early landing phrases like 'simplest fix', 'used too many tokens', or 'Let's stop here today' frequently appear in Claude's responses, you can audit existing logs with the `stop-phrase-guard.sh` script (https://gist.github.com/benvanik/ee00bd1b6c9154d6545c63e06a3...) shared by benvanik to check the frequency of shallow thinking."
Code Example
# Settings to add to settings.json
{
"effort": "high", // Increase thinking depth (default: 85)
"showThinkingSummaries": true, // Show thinking content in UI
"cleanupPeriodDays": 365 // Keep session logs for 1 year (default: 20 days)
}
# Session command
/effort high # Apply high effort to the entire session
/effort max # Apply maximum effort
# Or add ULTRATHINK keyword to the prompt (single turn)
# Disable adaptive thinking (environment variable)
export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1Terminology
Related Papers
Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s
Apple Silicon에서 Swift로 직접 행렬 곱셈 커널을 구현하며 CPU, SIMD, AMX, GPU(Metal)를 단계별로 최적화해 Gflop/s에서 Tflop/s 수준까지 성능을 높이는 과정을 상세히 설명한 글이다. 프레임워크 없이 LLM 학습의 핵심 연산을 밑바닥부터 구현하고 싶은 개발자에게 Apple Silicon의 성능 한계를 체감할 수 있는 드문 자료다.
Removing fsync from our local storage engine
FractalBits가 fsync 없이 SSD 전용 KV 스토리지 엔진을 구현해 동일 조건 대비 약 65% 높은 쓰기 성능을 달성한 설계 방법을 공유했다. fsync의 메타데이터 오버헤드를 피하기 위해 사전 할당, O_DIRECT, SSD 원자 쓰기 단위 정렬 저널을 조합한 구조가 핵심이다.
Google Chrome silently installs a 4 GB AI model on your device without consent
Google Chrome이 사용자 동의 없이 Gemini Nano 4GB 모델 파일을 자동 다운로드하고, 삭제해도 재다운로드되는 문제가 발견됐다. GDPR 위반 가능성과 수십억 대 기기에 적용될 때의 환경 비용 문제가 제기되고 있다.
How OpenAI delivers low-latency voice AI at scale
OpenAI redesigned its WebRTC stack to serve real-time voice AI to over 900 million users, detailing the design decisions and trade-offs of a relay + transceiver split architecture.
Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees
Deterministic Leaf Enumeration (DLE) cuts self-consistency’s redundant sampling by deterministically exploring a tree of possible sequences, simultaneously improving math/code reasoning performance and speed.