Turning Claude Code Into an Engineering Coach

The Setup

I have been building a personal toolkit inside Claude Code for about six months now. It started with one slash command and grew into a system of thirteen custom commands, five plugins, a custom agent, and a hook that evaluates every prompt I type before Claude sees it. None of this was planned. Each piece was built to solve a specific friction point I kept running into.

This post is a walkthrough of the whole setup: what each piece does, why it exists, and how they work together.

The Commands

Claude Code supports custom slash commands. These are markdown files in ~/.claude/commands/ that act as reusable prompts. Type /commandname and Claude follows the instructions in that file. No code, no dependencies. Just a prompt.

I have thirteen of them. They fall into three categories.

Code Review Commands

/branch — Outputs my git diff to main. This is the input for several other commands. Simple, but it’s the foundation.

/rate — Takes the output of /branch and rates my code from 1 to 5 against five different coding philosophies: Bob Martin’s Clean Code, Clean Code naming and structure principles, Domain-Driven Design, Object-Oriented principles from Elegant Objects, and idiomatic language conventions. Each philosophy has its own reference file (/bob, /clean, /ddd, /oop, /idiomatic) containing the specific rules Claude evaluates against. It also explains each term it uses, so I learn the vocabulary as I go.

/idiomatic — Reviews code for language-specific violations. It quotes the exact line, explains why it breaks convention, and cites the relevant style guide. No fix suggestions. The constraint of “cite the guide” forces precision — Claude can’t hand-wave about what’s non-idiomatic, it has to point to Effective Go, PEP 8, or whatever the authority is.

/scale — Given my diff, tells me what user scale my code supports, where exactly it breaks, and what technologies I’d need to handle progressively higher load. It produces a system design diagram and links to learning resources like MIT OpenCourseWare for the gaps it identifies.

Interview Prep Commands

/interviewer — A mock interviewer persona. You pick the topic and number of questions. It’s deliberately harsh — sardonic, impatient, zero hand-holding. If your answer is weak, it does not move on. It drills until you either demonstrate understanding or say the secret word to advance. The final scorecard is brutally honest: score per question, where you fell short, and what a strong answer looks like.

/system — A full system design interview simulator. It walks through a structured framework: Requirements, Core Entities, API Design, Data Flow, High-Level Design, Capacity Planning, Deep Dives. Each phase has its own questioning style and evaluation criteria. The Capacity Planning phase pulls from a separate question bank (/capacity-planning-dimensions) with probe chains, red flags, and specificity anchors for traffic, data, compute, and organizational scale. It ends with a phase-by-phase scorecard.

Productivity Commands

/overwhelmed — Connects to my Todoist via MCP and fetches tasks I’ve labeled overwhelmed. Uses Socratic questioning to decompose the task: one question at a time, no batching, never suggests the breakdown itself. Once I’ve articulated the pieces, it creates subtasks in Todoist and removes the label. If there are no tagged tasks, it falls back to a general scope-cutting coach that narrows to “your next best thing.”

/blog — The command generating this post. Reads my existing blog posts to match my writing style, drafts from the conversation context, shows it for review, saves the file, and publishes with npx quartz sync on confirmation.

The Plugins

Plugins extend Claude Code with skills that auto-trigger based on context. I have five installed.

superpowers — The most comprehensive one. It adds skills for brainstorming, planning, debugging, TDD, code review, parallel agent dispatch, and verification before completion. The brainstorming skill fires before any creative work. The debugging skill enforces systematic root-cause analysis before proposing fixes. The TDD skill enforces writing tests before implementation code. These are rigid workflows — they fire automatically and you follow them.

frontend-design — Generates production-grade frontend interfaces with high design quality. Activates when building web components or pages.

learning-toolkit — My own plugin, open source on GitHub. It has two auto-triggering skills. Learn detects when I’m trying to understand something and offers four learning modes: Socratic questioning, chunking for overwhelming topics, gap analysis for when I’m stuck, and syntax hints for when I know the concept but not the code. Prompt-check assigns a desperation level 1-10 to my prompts and only intervenes when something is actually wrong.

claude-md-management — Audits and improves CLAUDE.md files across repositories.

vercel — Vercel platform integration with deployment, environment management, and framework-specific guidance.

The Agent

I have a custom agent definition at ~/.claude/agents/socratic-code-teacher.md. When I ask Claude to help me understand code, architecture, or implementation details, it launches this agent instead of explaining directly. The agent asks questions one at a time, building from observable facts to relationships to implications to synthesis. It only provides direct information when I’ve demonstrated genuine effort but lack prerequisite knowledge.

The Hook

The Thinking Check lives in my global CLAUDE.md. It’s a rubric that evaluates every prompt I type on three dimensions: Specificity of Intent, Decision Ownership, and Diagnostic Effort. Each is scored 1-5. If the average is below 3, Claude coaches me toward a better prompt instead of answering. If it’s 3 or above, it responds normally and shows the scores.

The evaluation adapts to the type of prompt. Debugging requests get pushed toward “what did you observe, where do you think the problem is, what have you tried.” Decision requests get pushed toward “what are your constraints, what options did you evaluate, which way are you leaning.” Feature requests get pushed toward “what should it do, where should it live, what edge cases do you see.”

I can always override with “continue anyway.” But the pause is the point.

How They Work Together

The pieces compound. The Thinking Check forces me to write high-quality prompts. The learning toolkit catches when I’m prompting out of desperation rather than understanding. The Socratic agent teaches me through questions instead of answers. The interview commands test whether I actually retained what I learned. The code review commands evaluate my output against established principles. The overwhelmed command catches me when I’m stuck and decomposes the problem.

None of this required writing code. Every command is a markdown file with instructions. Every plugin is a set of markdown skill definitions. The hook reads a markdown rubric. The agent is a markdown persona. The entire system runs on prompt engineering and Claude Code’s extension points.

Takeaway

The default mode of AI tools is to do things for you. That’s useful when speed matters. But if you never think through the problem yourself, you end up dependent on the tool without growing as an engineer. My setup bends Claude Code in the opposite direction: it forces me to think before I prompt, learn instead of copy, and articulate my own understanding before accepting AI-generated answers. The tool becomes a coach that meets me where I am, not a vending machine that dispenses solutions.

Devin's Blog

Explorer