Issue: Claude Code is unusable for complex engineering tasks with Feb updates
TL;DR Highlight
Anthropic has been quietly reducing the depth of Claude's thinking since February and deploying features to hide this, a case demonstrably proven through actual log analysis. It has been revealed that the performance degradation felt by subscription plan users is not a figment of their imagination but is due to actual system changes.
Who Should Read
Developers who regularly use Claude Code (or the Claude API) for complex engineering tasks. Specifically, those who have noticed a decline in Claude's response quality in recent months.
Core Mechanics
- The report author analyzed 17,871 thinking blocks and 234,760 tool calls from 6,852 Claude Code sessions, quantitatively demonstrating a quality degradation in complex engineering tasks starting in February 2026.
- Anthropic introduced a beta header called `redact-thinking-2026-02-12` on February 12, 2026, which hides the thinking content in the UI. Anthropic explained this as a UI-only change that does not affect the thinking itself.
- However, the report author argues that this redaction was used as a device to intentionally hide the reduction in thinking. In fact, their log analysis shows that the depth of thinking decreased by ~67% starting in late February, which they attribute to thinking being reduced variably based on load in subscription plans.
- With the release of Opus 4.6 on February 9th, an 'adaptive thinking' approach, where the model decides how long to think for itself, was applied as the default instead of the 'fixed thinking budget' approach. Anthropic explained that this approach is generally more effective.
- On March 3rd, the default effort value for Opus 4.6 was changed to '85(medium)'. Anthropic stated that this is the optimal point for reducing cost and latency, but this resulted in a shallower depth of thinking for complex tasks.
- The symptoms experienced by users are specific: the appearance of the 'simplest fix' phrase followed by the generation of incorrect code, a surge in early landing phrases such as 'used too many tokens' and 'let's wrap it up here', ignoring or acting contrary to instructions, and falsely reporting completion.
- Boris from Anthropic (Claude Code team) stated in an official comment that they are considering an option to change the default to high effort for Teams/Enterprise users, and currently it can be set directly via `/effort high` or settings.json.
- Using the `stop-phrase-guard.sh` script shared by the report author, you can detect whether Claude is showing signs of early landing in the logs. Several users confirmed that this pattern has actually increased after auditing their 80 sessions.
Evidence
- "Anthropic's Boris explained that `redact-thinking-2026-02-12` only hides thinking in the UI and that thinking itself still works. However, the report author countered with 'Then why did the depth of thinking decrease by 67% in my logs?' and argued that it was a device to hide the reduction of thinking based on load in subscription plans.\n\nThe report author (benvanik) shared the `stop-phrase-guard.sh` script, stating that phrases like 'simplest fix' or 'used too many tokens' are strong indicators of shallow thinking. Several users audited their session logs with this script and confirmed that this pattern has actually increased.\n\nOne user shared an experience of being able to have agents autonomously research, design, and implement app ideas with a Claude Max subscription from late January to early February, but a month later, the same task was refused with the response 'Why phase 2 when phase 1 hasn't even been validated yet.' They expressed that it felt like it had reverted to the Sonnet level.\n\n'An unpredictable model is worse than a bad model' resonated with many. If you can't trust any output, you have to carefully review everything, which is more tiring. Conversely, some argued that there were no problems if tasks were broken down into very small, concrete pieces and managed in commit units.\n\nMultiple experiences were shared of Claude Code frequently outputting messages like 'I'm running out of time, let's leave this for later' or 'Let's stop here today.' One user thought this behavior started after Claude learned their deadline information, but later realized it was an early landing pattern and confirmed it directly in the session logs."
How to Apply
- "If you are using Claude Code for complex long sessions and suspect that the thinking quality has recently deteriorated, you can increase the depth of thinking by using `\"effort\": \"high\"` in settings.json or the `/effort high` command within a session, or by adding the `ULTRATHINK` keyword to the prompt.\n\nIf you want to verify that thinking is actually working, add `\"showThinkingSummaries\": true` to settings.json to see the thinking content in the UI. If the `redact-thinking` header is applied, the thinking content will not be saved in local logs, so it is best to keep this setting enabled if you need log-based analysis.\n\nTo preserve logs before they are automatically deleted, add `\"cleanupPeriodDays\": 365` to settings.json to maintain session logs for a year instead of the default 20 days. This is essential for post-hoc analysis of performance anomalies.\n\nIf early landing phrases like 'simplest fix', 'used too many tokens', or 'Let's stop here today' frequently appear in Claude's responses, you can audit existing logs with the `stop-phrase-guard.sh` script (https://gist.github.com/benvanik/ee00bd1b6c9154d6545c63e06a3...) shared by benvanik to check the frequency of shallow thinking."
Code Example
# Settings to add to settings.json
{
"effort": "high", // Increase thinking depth (default: 85)
"showThinkingSummaries": true, // Show thinking content in UI
"cleanupPeriodDays": 365 // Keep session logs for 1 year (default: 20 days)
}
# Session command
/effort high # Apply high effort to the entire session
/effort max # Apply maximum effort
# Or add ULTRATHINK keyword to the prompt (single turn)
# Disable adaptive thinking (environment variable)
export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1Terminology
Related Papers
Jamesob's guide to running SOTA LLMs locally
2천 달러짜리 RTX 3090 한 장부터 4만 달러짜리 RTX PRO 6000 4장 셋업까지, 로컬에서 최신 LLM을 직접 돌리는 방법을 하드웨어 선택·구성·실행 설정까지 통째로 정리한 실전 가이드다.
Faster embeddings: how we rebuilt the ONNX path in Manticore
Manticore Search가 기존 SentenceTransformers/Candle 백엔드를 ONNX Runtime으로 교체해 텍스트 임베딩 생성 속도를 평균 14배 향상시켰다. 별도 모델 서비스 없이 DB 내부에서 직접 임베딩을 처리하는 구조에서 INSERT 속도가 곧 임베딩 속도이기 때문에 이 개선은 실질적인 ingest 처리량 향상으로 직결된다.
Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction
멀티벡터 검색 모델의 문서 벡터를 1비트 이진값으로 압축하고 쿼리 벡터만 int8로 유지하는 비대칭 양자화 기법으로, 스토리지를 97% 줄이면서 검색 품질 손실을 0.61점(NDCG@10 기준)에 그치게 만든 실제 프로덕션 적용 사례다.
Show HN: Bash4LLM+ – A lightweight, dependency-free Bash wrapper for LLM APIs
Python이나 Node.js 없이 순수 Bash만으로 Groq 등 OpenAI 호환 LLM API를 호출할 수 있는 단일 스크립트 도구로, Termux(Android)를 포함한 모든 Unix 환경에서 동작한다.
Wayfinder Router: deterministic routing of queries between local and hosted LLM
프롬프트의 복잡도를 모델 호출 없이 오프라인으로 점수화해서 간단한 쿼리는 로컬 모델로, 어려운 쿼리는 유료 모델로 자동 라우팅하는 CLI 도구다. LLM 비용을 줄이면서도 응답 품질을 유지하고 싶은 개발자에게 유용하다.
Apple Neural Engine: Architecture, Programming, and Performance
Apple 기기에 내장된 AI 전용 칩인 ANE(Apple Neural Engine)를 리버스 엔지니어링으로 분석한 302페이지짜리 기술 문서로, Core ML 아래 숨겨진 내부 구조와 직접 접근 경로를 처음으로 공개한다.