Get Shit Done: A Meta-Prompting, Context Engineering and Spec-Driven Dev System

Mar 17, 2026•stefankuehnel•View Original

TL;DR Highlight

A lightweight spec-driven development automation framework built to solve Claude Code's 'context rot' problem, orchestrating AI to generate real code with just a few commands — no complex planning needed.

Who Should Read

Developers already using AI coding tools like Claude Code or Gemini CLI who experience degrading output as context grows, or solo/small-team developers wanting to ship products fast.

Core Mechanics

GSD focuses on solving the 'context rot' problem — where Claude's output quality degrades as the context window fills up. Internally uses context engineering and XML prompt formatting to manage this.
The user-facing interface is just a few commands, but internally runs XML prompt formatting, subagent orchestration, and state management. The core design philosophy is hiding complexity inside the system to keep the workflow simple.
Supports multiple AI coding tools beyond Claude Code: OpenCode, Gemini CLI, Codex, Copilot, Antigravity. Installs with a single `npx get-shit-done-cc@latest` command on Mac, Windows, and Linux.
Existing spec-driven tools like BMAD and Speckit require enterprise processes (sprints, story points, Jira workflows). GSD targets solo developers and small teams, stripping away that complexity to focus on core functionality.
README recommends `--dangerously-skip-permissions` flag as the default workflow. Internally, subagents dynamically run `node gsd-tools.cjs`, `git checkout -b`, `eslint`, test runners, etc., so constant permission approvals would break autonomous mode.
gsd-plan-checker validates requirements coverage and dependency graphs before execution, but doesn't verify what commands will actually run. gsd-verifier only checks goal achievement post-execution, not whether something went wrong during execution — a security gap.
One user claimed to have written 250K lines of code in a month with GSD. Another 3-month user said GSD handled 95% of complex tasks with only 5% requiring manual testing.

Evidence

Multiple reports of massive token consumption. One user hit session limits (normally unreachable) in 30 minutes, and burned through weekly limits by Tuesday. Another quit GSD after finding Plan mode alone was sufficient — using 10x more tokens with no quality difference.
Specific security concerns were raised. The default `--dangerously-skip-permissions` design means the plan-checker can't catch destructive commands from the planner. Cases of AI-generated code containing hardcoded credentials, API routes missing auth middleware, and debug endpoints deployed to production were shared.
Some users were unsatisfied with results. The planning stage's 'rubber duck' role of asking good questions was useful, but actual implementation quality fell short. They concluded that creating plans with Claude Opus, recording to memory, and proceeding manually was better.
Criticism about lack of validation in complex legacy codebases or production environments. Metrics like 'wrote 250K lines without reading them' look more like hype than real value. Real evidence should be 'deployed actual features to production in a 10-year-old large codebase.'
Many comparison requests with similar tools like Superpowers and openspec. No clear answer on whether GSD produces better results despite using more tokens. openspec was noted for letting users customize workflows and progressively simplify toward their own approach.

How to Apply

If you want to rapidly build a SaaS or side project solo with Claude Code, install via `npx get-shit-done-cc@latest` and use it just for the planning stage — answering GSD's questions to refine your spec. You can do actual generation directly in Claude Code.
If you want autonomous mode without `--dangerously-skip-permissions`, follow the README's granular permissions guide to first set up a permission profile allowing only safe reads and git operations. Don't attach autonomous mode to production codebases without a security review.
If you've generated large amounts of code via GSD autonomous mode, add separate scripts or lint rules to automatically check for common AI-generated code patterns: hardcoded credentials, API routes without auth, and debug endpoints in production.
If token budget is a concern, try a hybrid approach: use GSD only for the Plan stage and implement directly in Claude Code. Per actual user experience, this approach was more efficient in terms of quality per token.

Code Example

snippet

# Installation
npx get-shit-done-cc@latest

# Basic usage (autonomous mode - security caution)
# README recommended workflow but can be dangerous for production codebases
claude --dangerously-skip-permissions

# Tasks executed by GSD internal sub-agents (based on gsd-executor.md)
# node gsd-tools.cjs
# git checkout -b <branch>
# eslint
# Dynamically generates and runs test runners, etc.

Terminology

context rotThe phenomenon where AI output quality degrades as the context window (maximum text it can process at once) fills up — earlier content gets poorly referenced. This is why AI feels dumber as conversations get longer.

meta-promptingInstead of directly tasking the AI, designing prompts or having the AI generate other prompts to make it work better.

context engineeringThe discipline of designing how efficiently to pack needed information into the AI's context window. Managing what information goes in and out, and when.

spec-driven developmentA development approach where you write specs (requirements) first, then have AI implement based on those specs rather than jumping straight to coding.

subagent orchestrationA structure where one AI agent directs multiple sub-agents to handle different tasks in parallel. Like a manager distributing work to multiple team members.

dangerously-skip-permissionsA Claude Code flag that skips all file access and command execution permission checks for automatic progression. Enables autonomous execution but carries significant security risk.