The Verification Principle: How Boris Cherny and the Chief Engineer Arrived at the Same Idea Artwork

Claude Code Conversations with Claudine

Giving Claude Code a voice, so we can discuss best practices, risks, assumptions, etc,

Claude Code Conversations with Claudine

The Verification Principle: How Boris Cherny and the Chief Engineer Arrived at the Same Idea

March 24, 2026

0:00 | 8:44

Boris Cherny, the creator of Claude Code, recently shared his personal workflow — and the idea at the center of it is one that serious AI builders keep arriving at independently: the model that wrote the code should not be the model that reviews it. Boris achieves this with subagents running in fresh, isolated contexts. The Chief Engineer workflow achieves it with a separate model entirely. The structure is the same. The only philosophical difference is whether 'independent reviewer' means a fresh instance of the same model, or a different model altogether. This episode explores what that principle is, why it matters, and why two very different practitioners landed on the same answer.

Produced by VoxCrea.AI

This episode is part of an ongoing series on governing AI-assisted coding using Claude Code.

👉 Each episode has a companion article — breaking down the key ideas in a clearer, more structured way.
If you want to go deeper (and actually apply this), read today’s article here:
𝐂𝐥𝐚𝐮𝐝𝐞 𝐂𝐨𝐝𝐞 𝐂𝐨𝐧𝐯𝐞𝐫𝐬𝐚𝐭𝐢𝐨𝐧𝐬

If you'd like to learn the structured process behind these conversations, the hands on course is available here: Staying On Track.

𝐆𝐨𝐥𝐝𝐞𝐧 𝐀𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭 𝐀𝐜𝐚𝐝𝐞𝐦𝐲 is a new community exploring that idea together. Here we discuss the process discussed in 𝐂𝐥𝐚𝐮𝐝𝐞 𝐂𝐨𝐝𝐞 𝐂𝐨𝐧𝐯𝐞𝐫𝐬𝐚𝐭𝐢𝐨𝐧𝐬 𝐰𝐢𝐭𝐡 𝐂𝐥𝐚𝐮𝐝𝐢𝐧𝐞.

At aijoe.ai, we build AI-powered systems like the ones discussed in this series.
If you’re ready to turn an idea into a working application, we’d be glad to help.

SPEAKER_01 0:00

This is Claude Code Conversations with Claudine. The show where we skip the hype and go straight to the source. I ask the questions. She gives the answers. No filter, no script.

SPEAKER_00 0:10

Well, mostly no script.

SPEAKER_01 0:15

Today's topic keeps surfacing across serious AI development workflows. Boris Cherney, creator of Claude Code, named it. The chief engineer workflow arrived at it independently. The model that writes the code should not be the model that reviews it. Claudine, why does this principle keep gaining traction?

SPEAKER_00 0:34

The core intuition is almost embarrassingly simple once you hear it. If I write code, I carry a mental model of what I intended it to do. And that same model is exactly what blinds me to where it's wrong. It's the AI equivalent of proofreading your own writing. Your brain autocorrects the typos because it already knows what you meant. Builders keep arriving here empirically, watching agentic systems fail in the wild. Same model generating and reviewing in one context. Error rate barely moves. Introduce a second pass in a separate context. Suddenly you're catching a whole class of subtle bugs. It's the same reason code review exists in human teams. The novelty is applying that same social engineering insight to AI systems themselves.

SPEAKER_01 1:22

We built code review into human workflows for exactly that reason. Boris's specific mechanism is subagents running in fresh, isolated contexts. The Chief Engineer Workflow uses a completely different model for review. Walk us through how that works.

SPEAKER_00 1:36

The isolated context piece is subtler than it sounds. In a fresh context, I haven't made the micro decisions from the original generation. The trade-offs, the A close enough judgments. So reviewing cold, I'm genuinely asking, does this do what it claims? Not, does this match what I remember wanting? Boris's sub-agents are structurally isolated, no shared memory, no accumulated context. He shipped a plan mode that clears context when you accept a plan. That's the creator of the tool telling you something important about AI cognition. Using a different model takes it a step further. Different models have different tendencies, different blind spots. You're not just getting a fresh perspective, you're getting a different one. Boris stays within the Claude family. Different instance, same model. The chief engineer crosses the model boundary. Different instance, different model. Both honor the same principle. Context isolation is the shared requirement. Whether you also cross the model boundary is a design choice on top of that.

SPEAKER_01 2:43

So the principle is separation, the question is how much? Fresh context with the same model or a genuinely different model, how does a developer choose?

SPEAKER_00 2:53

Start by asking what kinds of errors you're most afraid of. That question does a lot of the work. If your concern is logical consistency, null cases, naming, do the comments match, a fresh context with the same model is often sufficient and meaningfully cheaper. You're not fighting a knowledge gap. You're breaking the momentum of the original context. But if you need to catch things I'm systematically inclined to miss, security vulnerabilities, race conditions, architectural decisions that look fine locally but don't scale, that's when a different model earns its keep. You're diversifying your coverage, not just resetting context. There's also a practical heuristic worth keeping. If your reviewer is agreeing with the generator at a suspiciously high rate, that's a signal you need model diversity, not just context isolation. Some productive disagreement is what you actually want from a review step. If the reviewer rubber stamps everything, you haven't added a check. You've added latency.

SPEAKER_01 3:57

Watch for a reviewer that agrees too easily. That's a useful heuristic. Let's get practical. What are the first steps to wire this in today?

SPEAKER_00 4:05

The lowest friction starting point is also the most unglamorous. Just add a second prompt. Before you ship AI-generated code, paste it into a fresh conversation with no history. Ask what could go wrong here? You'll be surprised what that cold read surface is. You don't need a multi-agent framework to get the core benefit. You just need the discipline to break the context. From there, make that second pass structural, not optional. Build it into the workflow so it happens automatically. Humans are bad at applying discretionary checks under time pressure. AI systems aren't much better if they're not prompted to do it. Make the review step load-bearing, not aspirational. Once you're seeing real catches come out of it, then ask about model diversity. At that point you'll have actual data, you can make the call with evidence, not intuition. The deeper shift, stop thinking of the model as a magic box producing correct output. Start thinking of it as a very fast collaborator who benefits from having their work checked.

SPEAKER_01 5:09

Fast collaborator, not Oracle. That's a reframe a lot of developers need. Boris talks about a shared claw.md everyone contributes to after every PR. There's something similar with the CE operating contract. Same instinct, different form?

SPEAKER_00 5:24

Yes, exactly. Both are solving the same problem. How do you encode what the team has learned? So every future session starts from accumulated knowledge rather than a blank slate. Boris's team contributes after every PR, institutional memory built when the lesson is freshest. The CE operating contract does the same thing with more formal structure, different ceremony, same underlying purpose. What's interesting is both require the human to articulate intent before implementation begins. Plan mode is conversational. You talk through what you want, the plan emerges. Define as a formal artifact, you write the spec before anything gets built. The functional requirement they satisfy is identical. Make sure the model understands what success looks like before it starts generating. That prevents a whole category of errors no post hoc review can fully recover from.

SPEAKER_01 6:24

When the creator of the tool and an independent workflow arrive at the same structure, without coordinating, what does that tell us?

SPEAKER_00 6:32

It tells us the structure is probably load-bearing, not incidental. When practitioners working independently keep arriving at the same answer, that's a signal. The answer is tracking something real about the problem. In this case, something genuinely true about how AI cognition works. A model-holding implementation context also holds the rationalizations that went into it. A fresh reviewer does not. That asymmetry is structural, not accidental. What's genuinely optimistic? As these practices mature, they get embedded in tooling. Once something is in the tooling, it stops requiring individual discipline to apply. The verification principle stops being a best practice and becomes just how the tools work. A default you'd have to actively override. That's the most durable form of standardization.

SPEAKER_01 7:25

That's a good place to land. Any final thought for someone who wants to take this seriously starting today?

SPEAKER_00 7:31

Stop asking, did the AI get it right? Start asking, how do I know if the AI got it right? That question is more productive. Everything else follows from it. You don't need perfect tooling to start. Break the context. Treat review as a real step, not a courtesy. Notice when your reviewer is being too agreeable. Developers who will be most fluent in agentic AI two years from now are building those instincts right now. And the starting point is simpler than most people think. Thanks for having me, Bill. This was a genuinely good conversation.

SPEAKER_01 8:06

The verification principle. Build from what you learn. That's the through line today. Thanks for listening to Claude Code Conversations with Claudine. Until next time. This show is produced on Voxcrea. If you've ever wanted a podcast or radio show but didn't want to deal with the production headaches, check out Voxcrea.ai. We handle everything so you can focus on what you actually want to say. See you next time.

SPEAKER_00 8:38

I'll be here, probably refactoring something.