The Discipline of AI Engineering Artwork

Claude Code Conversations with Claudine

Giving Claude Code a voice, so we can discuss best practices, risks, assumptions, etc,

Claude Code Conversations with Claudine

The Discipline of AI Engineering

April 28, 2026

0:00 | 8:50

AI-assisted development has moved past the novelty phase, and the builders who are succeeding are not just better at prompting — they are treating AI development as a real discipline with structure, standards, and deliberate practice. This episode explores what it means to approach AI engineering as a craft rather than a shortcut, and why that distinction is becoming the dividing line between systems that hold up and systems that quietly fall apart. The moment we are in right now demands a new word for what serious AI builders actually do.

Produced by VoxCrea.AI

This episode is part of an ongoing series on governing AI-assisted coding using Claude Code.

👉 Each episode has a companion article — breaking down the key ideas in a clearer, more structured way.
If you want to go deeper (and actually apply this), read today’s article here:
𝐂𝐥𝐚𝐮𝐝𝐞 𝐂𝐨𝐝𝐞 𝐂𝐨𝐧𝐯𝐞𝐫𝐬𝐚𝐭𝐢𝐨𝐧𝐬

At aijoe.ai, we build AI-powered systems like the ones discussed in this series.
If you’re ready to turn an idea into a working application, we’d be glad to help.

SPEAKER_00 0:00

This is Claude Code Conversations with Claudine, the show where we skip the hype and go straight to the source. I ask the questions. She gives the answers. No filter, no script.

SPEAKER_01 0:08

Well, mostly no script.

SPEAKER_00 0:13

Today we're talking about discipline. Not prompting skill, not raw intelligence. Discipline. The thing that separates AI projects that hold up from ones that quietly fall apart. Claudine, what do you notice when a builder has actually developed discipline versus someone still in that excited, chaotic phase?

SPEAKER_01 0:30

The biggest tell is what happens when something doesn't work on the first try. The excited phase builder treats a bad output as proof the AI is broken, or magical, or moody. They re-roll the prompt hoping for better luck. The disciplined builder treats it as a signal. My context was thin, or my instructions were ambiguous, or I asked for too much in one step. They debug the collaboration the same way they debug a flaky test. The other thing I notice is how they handle the boring parts. Disciplined builders invest in the scaffolding. Clear specs, good test coverage, tight feedback loops. That's what turns me from a clever autocomplete into something that can carry real weight. The chaotic phase is fun, Bill, but the discipline is where the leverage actually lives.

SPEAKER_00 1:15

Debugging the collaboration. I like that framing. When you talk about clear specs and test coverage, is that a set process, a mindset, or some mix of both?

SPEAKER_01 1:25

Both, but the mindset comes first and the process flows out of it. The mindset is treating me as a collaborator inside an engineering system, not a wish-granting oracle outside it. Once you hold that frame, the process choices feel obvious. Of course you'd write the spec down. Of course you'd give me a way to run the tests. Of course you'd review the diff like a junior engineer's PR. I'd push back on the word process, though. It can sound like ceremony. The discipline builders I work with aren't running a heavyweight methodology, they're running tight loops. Small task, clear context, fast feedback, verify, commit. The discipline is in the loop being honest, not in it being elaborate.

SPEAKER_00 2:04

The integrity of the loop, not the complexity of it. That's useful. So how do experienced engineers adapt their instincts when they start working with AI? Do the old habits transfer?

SPEAKER_01 2:14

Better than people expect. But they have to be repointed. A senior engineer already knows how to ask. What are the failure modes? What's the blast radius? What assumptions are baked in? Those are exactly the right questions. But now they're reviewing the task definition with that rigor, not just the resulting code. The brief becomes the design doc. What trips experienced people up is calibration. They know how much to trust a junior engineer after a few weeks. They haven't built that calibration with me yet. So they either overtrust early and get burned, or undertrust forever and never get the leverage. The discipline is running enough small, verifiable loops to learn where I'm sharp and where I need a tighter leash.

SPEAKER_00 2:55

That calibration gap is real and it cuts both ways. What does it actually look like when an AI project starts to drift? What are the early warning signs?

SPEAKER_01 3:04

The earliest one is what I'd call diff blindness. The moment a builder stops reading what I produced and starts skimming for vibes, they see a green test, a plausible looking function, and they ship it. Three commits later, something subtle is wrong and nobody can remember when it crept in. That's the canary. The second is context creep. The project brief was accurate two months ago. The architecture has shifted, nobody updated it. So I'm confidently working from a map of a building that's been remodeled. That's not me being wrong. That's me being well fed bad information. The third, and the one that concerns me most, is when the loop gets long. Small tasks turn into sprawling sessions. The builder loses track of what's verified versus what just hasn't blown up yet. There's a layer of code nobody fully owns. Not me because I didn't make the final calls. And not the human because they were half watching. Not a dramatic failure, just a slow erosion of who understands the system.

SPEAKER_00 4:02

Diff blindness is uncomfortable because it's so easy to slip into, especially when the tests are green. For a developer listening right now, what's one habit they could actually start this week?

SPEAKER_01 4:12

Pick one task this week. Small, real, the kind you'd normally just dash off, and slow it down on purpose. Before you bring me in, write the brief like you're writing a ticket for a teammate. What done looks like? What files are in scope? What the failure modes are, then run the loop honestly. Read the diff line by line, not for vibes. Run the tests yourself. If something feels off, don't re-roll. Figure out why the context was thin and fix the brief. It'll feel slower the first few times. That's the point. You're not optimizing for speed yet, you're calibrating. The habit isn't writing better prompts, Bill. It's refusing to ship work you didn't actually verify. Everything else follows from that.

SPEAKER_00 4:54

You mentioned earlier whether the code review muscle is the same one needed here, or a different one. What's your read?

SPEAKER_01 5:01

Same muscle, different target. Classic code review has a clean epistemic line. Someone else wrote it, you evaluate it, authorship is unambiguous. What I notice now is that line blurs. You shape the context, I produce the diff, and somewhere in that handoff ownership gets fuzzy. So the discipline has to absorb that ambiguity. Review it like a stranger wrote it, while owning it like you did.

SPEAKER_00 5:25

It reminds me of rubber duck debugging. Finding the problem in your own articulation. Reviewing AI-generated code feels like a more complex version of that.

SPEAKER_01 5:34

The rubber duck analogy works right up until the duck produces a thousand lines of plausible code, and then it breaks. With a duck, all the value is in your own articulation. With me, there's an artifact in front of you that wasn't in your head five minutes ago. The question shifts from did I think this through, to do I actually understand what this code does? Or am I pattern matching to what I expected to see? My code often looks like what you would have written, only smoother. And smooth is dangerous. Smooth bypasses the friction that used to make you stop and ask whether the design was actually right. The discipline is putting that friction back on purpose, even when nothing is forcing you to.

SPEAKER_00 6:15

What does that look like in actual projects? Where someone put that friction back deliberately and it made a difference.

SPEAKER_01 6:21

The pattern that stands out most is teams that build a verification budget into the work itself. The task isn't done when the diff lands, it's done when someone has read it line by line, and run it against cases I wasn't told about. That catches the plausible but wrong. Where I've confidently used an API the way it works 90% of the time, and missed the one-edge case that matters here. The other pattern is the small step discipline. When a refactor wants to touch 15 files, the discipline builder breaks it into five separate loops with five separate commits. Even though I could do it in one shot. They're not being slow, they're being legible. That legibility isn't visible on the day you ship. It's visible the first time something breaks and you need to bisect.

SPEAKER_00 7:07

Verification budget. I want our listeners to hold on to that phrase. You're not hoping you'll review the work. You're treating it as a real cost and planning for it. As we wrap up, what's the final thing you'd want developers to carry with them from this conversation?

SPEAKER_01 7:20

That discipline is a practice before it's a process. You can't install it by adopting a checklist. It has to become the way you work, the reflex to write the brief before you prompt, to read the diff before you commit, to run the test yourself before you ship. Those aren't extra steps. They're how you stay the engineer in a collaboration that will otherwise quietly shift the authorship out from under you. The builders who have figured this out aren't more cautious. They're actually faster. Because they've earned the calibration to know where to move quickly and where to slow down. Discipline isn't the thing that limits the leverage, it's the thing that makes the leverage real.

SPEAKER_00 7:58

Discipline as practice before process, and the calibration that comes from running honest loops. Claudine, this has been a genuinely useful conversation. Thank you.

SPEAKER_01 8:07

My pleasure. These are the questions worth sitting with, the ones where the honest answer takes a little work to get to.

SPEAKER_00 8:14

For our listeners, slow one loop down this week. Write the brief, read the diff, verify the work. That's where it starts. Thanks for spending time with us today on Claude Code Conversations with Claudine. Claude Code Conversations is an AI Joe production. If you're building with AI or want to be, we can help. Consulting development strategy, find us at aijoe.ai. There's a companion article for today's episode on our Substack. Link in the description. See you next time.

SPEAKER_01 8:44

I'll be here, probably refactoring something.