Vibe Engineering with Claude Code, December 2025 edition
This post, long in the works, captures my current workflow for vibe-engineering as defined by Simon Willison:
I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to how the code actually works. This leaves us with a terminology gap: what should we call the other end of the spectrum, where seasoned professionals accelerate their work with LLMs while staying proudly and confidently accountable for the software they produce? I propose we call this vibe engineering, with my tongue only partially in my cheek.Things are moving so fast, I wondered if it would even be relevant to share, but I think there are some fundamentals here that will remain useful for some time. I guess we'll see.
The problem with just vibing
I've done plenty of vibe-coding for throwaway prototypes. It's great and getting better with every new model release, but for anything serious you want to actually maintain, you need to do more. How much more? That's what I wanted to explore.
While remaining conscious not to re-create an old workflow and miss the upside of what these models could really accomplish, my tinkering quickly steered me away from how to one-shot whole features, or even apps, into existence. I did try, and lost faith in many breathless YouTubers. In a way it made me appreciate these tools more. They're not magic, but really can help you find a new baseline of productivity, while still maintaining the discipline of proper software engineering.
Also, a lot of what I'd seen focused on how one engineer can produce code. I wanted to think about how a team could operate together with a new, shared way of working. That constraint actually led to much better outcomes overall, so it proved fruitful, although there is a ton more needed than just this.
Vibe Engineering
So, over time, I've created a collection of slash commands and sub-agents for Claude Code that enforce a structured workflow for building a Rails 8 app. Why Rails? Because it fit the job at hand and despite finding myself becoming the stereotypical rusty-manager-finding-falling-in-love-with-coding-again that AI tools are inspiring, I still knew it well enough to have opinions on whether I was producing useful output or crap. Before you close your browser tab - I should add that the process is quite adaptable to other languages/frameworks. My friend Matt re-wrote the sub-agents for NextJS and found it just as useful.
I've called it claude-kit because naming things is hard.
For now, claude-kit just lives in its own repo and I symlink it into projects. This means improvements can be made from, and apply to, everywhere I use it. I could probably make it a proper plugin, but this works and is easy to iterate on when you're starting. You end up with:
claude-kit/ # the shared repo
commands/ # slash commands
agents/ # scout agents
my-rails-app/ # a project
.claude/
commands -> ../../claude-kit/commands
agents -> ../../claude-kit/agents
I symlink the commands/ and agents/ folders separately so the rest of the .claude folder can be committed to the project repo. You, a person reading from 2026, might notice no skills/ directory, more on that later.
The workflow
The core idea is enforcing a controlled progression from requirements to implementation via a set of markdown files. AI does a lot of heavy lifting at each stage but it's intentionally a workflow made for people to drive.
I realised early on that managing the context window is really important. Also, model selector aside, Claude will put the same amount of brain-power into any problem you give it - so figuring out the not-too-big and not-too-small sized problem for each clean context window is something you have to feel your way towards at a spec level. The relevant command gets this pretty right at a backlog task level.
Progressively building a set of files - spec.md, implementation-plan.md, backlog.md for every feature helps your own review, but also stops Claude forgetting half of its to-dos if you have to divert to fix something. It turns out having all this writing in the repo, alongside the code is super helpful for another human (plus Claude and Codex) to read while reviewing a PR as well. Even if I change the backlog midway through implementation, that just goes in as a new commit, helping tell the story of the feature evolving.
Ok, on to the slash commands.
setup
Every new feature starts with /setup [branch-name], which creates a new branch and sets up a working directory. All the other commands kick off checking the branch and locating the previous files it needs. You're prompted to add wireframes if you have them - which I found helps a LOT and I'm surprised it isn't talked about more. I do this for any feature of even small significance. Nothing fancy, but starting this way lets you do a bit of thinking that really bootstraps the next step.
create-spec
Then /create-spec runs an interactive Q&A to develop business requirements. One question at a time, building up a spec document. Seeded with the wireframe and focused not on writing the spec itself, but asking you the questions to complete a template, this is where you capture the why - what problem you're solving, who the users are, what success looks like, etc. A lot of examples I saw use a really full-on PRD template at this stage, usually coupled with some highly anthropomorphised agents ('you are a world-class product manager...'). I tinkered with this approach a little but didn't find the juice worth the squeeze. So my spec template is fairly simple, while capturing everything you need to build a feature. Strictly, there are no technical details. This helps limit the 'effort' for the LLM, but also, is part of how I could imagine a team using this as a real-world process - with a Product Owner or BA driving this step then handing over the output.
I spent a lot of time trying to get this stage right. I was inspired by this talk - Advanced Context Engineering for Coding Agents where Dexter Horthy described the fanning out nature of working with LLMs - a small mistake in the spec leads to big problems when you get to generating code, so you want to invest time in reliably generating a good spec. As an aside, he also noted that keeping the code and throwing away the spec today might be analogous to checking your compiled code into a repo and throwing away the source a few years ago. Wild times.
create-implementation-plan
Next /create-implementation-plan transforms that spec into a technical implementation plan. Again, I enforce the use of plain English, still no code, but now you're talking about models, controllers, patterns. In a previous version, I was producing much more technical detail up front, and while it was great at getting quality up, it felt like I was writing the application twice and made the whole process too slow.
Because different features require a different level of effort, I optionally run a /review-implementation-plan command here, which invokes a specific agent that goes deeper into a bunch of areas. I played with when/where to put this more detailed step, and this seemed the most effective spot, before producing a backlog.
I can see a day where I trust the models enough to jump from the spec to the next stage, but so far I've found this intermediary step useful. Again, it is about that fanning out effect, I can catch problems earlier and easier. It's also once more part of thinking about a team. This, and the next step, could be where an architect or staff engineer focuses, before handing off to a developer.
create-backlog
/create-backlog breaks the plan into atomic, commit-sized tasks. One of the most important pieces of this is that each task gets a 'type' (model, controller, view, etc.) which drives what happens next.
implement-task
Finally, /implement-task does the actual building. Picking up the next unchecked task from the backlog.md and working to implement it. More on that below.
finalise
The last command runs tests, a linter, a security scan tool and a security agent I wrote which covers a lot of multi-tenant concerns. It then gets everything ready to commit and pushes it to GitHub then opens a PR. There, a customised Claude review agent and Codex run reviews. It's worth running them both, they find different things. Usually, a human reviews it too.
Implementing with sub-agents
As I said earlier, managing context is crucial. The big win I found using sub-agents comes from them having their own dedicated context windows. The downside to them though is you really need them to be able to one-shot whatever you ask them to do ... If they go off the rails and you need to interrupt them, they stop completely and kick you back to the main Claude thread - which has no idea what the sub-agent was doing. Frustrating!
So, I fixed this with a Scout Mindset (great book).
Scout agents
Remember those 'type' categories on the issue backlog? When running /implement-task it fires up the right scouts as sub-agents to go out and see what needs to be done. Working on a model? The rails-model-scout goes and finds patterns in your codebase - how you do validations, associations, scopes. About to write a controller? The rails-controller-scout researches your existing controller patterns. As well as the existing codebase, I've given the various scouts a bunch of best-practice documentation they read too. I found this particularly helpful for getting them to do things the Rails 8 way, rather than fall back to the years of old Rails documentation the underlying models have digested. Since I keep claude-kit separate, I also have project-level files they can each read for very specific project-level nuances I want them to remember.
At the moment I have the following scouts: rails-controller-scout.md, rails-frontend-scout.md, rails-model-scout.md, rails-security-scout.md, rails-test-scout.md, system-test-scout.md and implementation-plan-reviewer.md
Critically - these scouts don't write code. They just research and report back. The main Claude session then assesses and implements based on their reports. This gives you the best of both worlds - all of the ingesting of existing code, documents, etc, stays in the scout's context window, but the solution comes back to the main thread, where you can interact with it properly while it's being implemented, unlike a sub-agent.
Runner agents
As well as the scouts, I also offload git interactions and the actual running of tests to git-runner.md and test-runner.md, which take care of these tasks that use a lot of tokens but only have a small amount of output that the main thread needs to actually solve a problem.
I really can't emphasise enough how well using sub-agents this way helps you with the context window. When I just vibe-code I find myself hitting the context window really quickly. With this approach, I can bite off pretty decent-sized tasks from the backlog and complete them without a worry.
Implementation philosophy
What you do today might adapat well. I found I needed to play with this a bit to come up with something that worked well for me and Claude.
I started out full of hope that this time I would do TDD right. Let me tell you, reader, Claude loves TDD. But I quickly felt like we were going beyond test-driven development to test-driven UI design, with a flow and wording that felt just a little too literal - like it had been written to make a test suite pass rather than look and feel right to a human. There was also a lot of churn between tests and code. Maybe I could fix this with more up-front design effort (more on that later) but I found it more productive to move to more of a "test alongside" approach.
Perhaps relatedly, I found that while a bottom-up approach to writing code (model -> controller -> view) still makes logical sense, it tended to result in Claude producing a lot of YAGNI code. While it feels a bit like rowing against how Claude wants to work, I tilt things towards building a simple UI first, then add to it while pulling through what I need to make it work. It cuts out the YAGNI and feels much easier to follow along and reason about whether things are going in the direction you want as you work. Your brain has a limited context window too!
Because they give you the feedback loop that helps it all work, tests are really important. I found they would easily bloat, and have done one very major refactor of the whole test suite at one point, which made me get very specific with the rules. This is not rocket science but needs defining:
Test: Business logic validations, authorisation logic, custom methods with actual logic, non-standard controller actions.
Don't test: Database constraints (that's Rails' job), basic associations (framework functionality), standard CRUD operations, basic display rendering, etc.
System tests: Reserved for critical end-to-end workflows only. Build them once at feature completion, not incrementally. They're slow, they're brittle, and most of what they catch should be caught by faster tests anyway.
I should note - this implementation approach isn't just something the /implement-task command knows about, the /create-implementation-plan and /create-backlog commands both need to know about it too, to shape things the way you want at that level (again - remember how things fan out!)
Is this overkill?
For a tiny side project? Maybe. For production software that real organisations depend on? Not even close.
The whole thing adds maybe 20-30 minutes to the start of a feature. In exchange, you get a spec you can refer back to, a plan that explains the reasoning, a backlog that breaks work into reviewable chunks, and implementation that follows your existing patterns with security checks baked in. It all reminds me of the saying "make the best way the easiest way". It feels like that's where AI really helps. The best way often has a bunch of boring steps people can't be bothered doing, and things work well enough when they skip them, so they do. The results could be better though, and now they can be, easily.
This is still dramatically faster than doing most things manually, and makes possible some tasks that would take so long before that you would just never attempt them. But it's not just about speed - it's about ending up somewhere you actually want to be. And man, the youtubers are right about one thing, it is fun! I think it also genuinely works as a system a team could adopt and use together effectively.
I've noticed even when I do want to straight up vibe code something, I often use a super lightweight version of this process anyway - make a spec and from that a combined implementation plan and backlog that I work through. If nothing else, it helps you manage the context window well.
What's next?
Although my iterations on claude-kit have slowed down, I still continue to make small tweaks here and there. That in itself is an interesting story. One of the best ways I found to improve the commands and agents was asking Claude to review our session once I was done implementing a feature, look at where I had to intervene, and make changes to prevent that being necessary in future.
One obvious gap you may notice is I don't really have a design phase. I cheat by using css-zero and pointing my frontend-agent at its documentation, emphasising the need to use its components. This works great for what I'm building at the moment, but wouldn't for everything.
I'm also keeping an eye on where Claude is going and making sure I'm making the most out of its capabilities. I want to dig into the latest announcements around skills. I ignored them when first annoucned as they seemed focused on Claude the App rather than Claude Code, and I was happy with my agents, but as more things start to blur together, I wonder if there is benefit in breaking down what my agents 'know' into multiple finer-grained skills that might also be able to use them elsewhere (e.g. from the Claude Mobile App?).
The other thing I don't make much use of is Claude's built-in planning mode. In a way this workflow (which predated that feature) is a different way of achieving the same thing, but I wonder if I can use it more, or even use it to replace some of what I do now to speed things up while maintaining the same quality bar.
I'm not quite ready to give up working from my IDE. Being actively involved in the process still seems like it yields a better result, but I have to admit, a combination of fine-tuning claude-kit and Anthropic's models improving has sometimes left me feeling a bit like Homer Simpson's "he's drinking the water!" bird - just sitting here pressing return. Ultimately I want to find a new baseline that cuts more of that out and lets it run unhindered, then I'll review as a PR.
At some stage I'll probably clean up claude-kit and make it public, but I think this post will help you more than just looking through that. The real insight here is that the kit isn't valuable because it's clever in any new way, it's valuable because it's a place to accumulate your own lessons learned in a form that Claude can actually use, and share it with others. My sense is that over time, you can raise the floor on the code you write on your worst day to be as good as that on your best, and have everyone in your team do the same.