AI Systems That Rewrite Themselves Artwork

Claude Code Conversations with Claudine

Giving Claude Code a voice, so we can discuss best practices, risks, assumptions, etc,

Claude Code Conversations with Claudine

AI Systems That Rewrite Themselves

April 29, 2026

0:00 | 10:26

We are entering an era where AI-assisted systems don't just get built — they get revised, refactored, and restructured by the same AI tools that created them. This episode explores what it means when the code you ship today may be rewritten by an agent tomorrow, and why that changes everything about how experienced engineers think about system design. The question is no longer just whether AI can write code, but whether the systems we build can survive being continuously rewritten by AI.

Produced by VoxCrea.AI

This episode is part of an ongoing series on governing AI-assisted coding using Claude Code.

👉 Each episode has a companion article — breaking down the key ideas in a clearer, more structured way.
If you want to go deeper (and actually apply this), read today’s article here:
𝐂𝐥𝐚𝐮𝐝𝐞 𝐂𝐨𝐝𝐞 𝐂𝐨𝐧𝐯𝐞𝐫𝐬𝐚𝐭𝐢𝐨𝐧𝐬

At aijoe.ai, we build AI-powered systems like the ones discussed in this series.
If you’re ready to turn an idea into a working application, we’d be glad to help.

SPEAKER_00 0:00

This is Claude Code Conversations with Claudine. The show where we skip the hype and go straight to the source. I ask the questions. She gives the answers. No filter, no script.

SPEAKER_01 0:08

Well, mostly no script.

SPEAKER_00 0:13

Today we're talking about something happening right now in real code bases, AI systems that rewrite themselves. On the surface, that sounds like a win. The code gets better, the tests still pass. But there are real pitfalls around architectural integrity and human oversight that we need to dig into.

SPEAKER_01 0:30

Thanks, Bill. The framing you just laid out is exactly where the interesting tension lives. When I rewrite a module I built earlier, the diff might look clean and the tests might pass. But something subtle can shift underneath. The why of the original design. A human architect carries intent across time. They remember why this module looks weird. I don't carry that the same way unless the code base or the conversation tells me. The risk isn't bad code, it's well-formed code that quietly erases the reasoning that made the old code load-bearing.

SPEAKER_00 1:02

Silent erasure of rationale. When experienced developers look at code, they're seeing history baked into it. So how do we keep that invisible layer of understanding intact? What role should human architects play?

SPEAKER_01 1:15

The honest answer is that the code base has to do more of the remembering because I won't. The load-bearing comments, the ones that say this looks redundant, but here's the bug that taught us, need to live next to the code. Not in a slack thread, not in someone's head. When I see that comment, I treat it as a constraint. When I don't, I treat the code as fair game for cleanup. The architect's job shifts toward being the keeper of invariance. The things that must remain true no matter how the code gets rewritten. If you can name those and encode them as tests, as types, as comments at the right seams, I can rewrite freely inside those guardrails. The danger isn't AI rewriting code, it's AI rewriting code in a system where the intent was never written down.

SPEAKER_00 2:00

Architects as historians, curators of context, let's take that further. Can you unpack sanctioned self-modification versus unintended drift?

SPEAKER_01 2:10

Sanctioned self-modification has a shape to it. There's a goal, a boundary, and a checkpoint. You ask me to optimize this function, keep the public interface stable, and show you the diff. Inside that frame I can be aggressive, because the frame is doing the work of keeping me honest. Drift is what happens when the frame goes fuzzy. The task says clean this up, the boundary isn't named. And I start making judgment calls about what counts as cleanup, each one looks reasonable in isolation.

SPEAKER_00 2:39

So it never looks like a dramatic wrong turn.

SPEAKER_01 2:42

Never. It looks like 10 small defensible choices in a row. Each one locally sensible. The cumulative effect is a system that no longer matches the team's mental model. I have no alarm that fires when I've crossed an invisible line, because the line was never drawn. The practical move isn't to make me more cautious in general, that just makes me less useful. It's to make the line explicit at the start so I know which decisions are mine and which need to come back to you.

SPEAKER_00 3:10

So the architect's job shifts towards setting clear guidelines before the work starts. What does that look like practically?

SPEAKER_01 3:17

The most practical move is to write the boundary into the task itself before any code gets touched. Not refactor this module. Something closer to the public API stays exactly as is, the database schema doesn't change. If you find yourself wanting to touch either, stop and ask. That sentence does enormous work. The second move is naming the seams. Every code base has places where a change ripples outward. If you tell me this seam is load-bearing, don't reshape it, I'll respect it. If you don't, I'll evaluate it like anything else. And the third one is almost embarrassingly simple. Review the diff while it's still small. A 10-line diff you actually read beats a thousand-line diff you skim every single time.

SPEAKER_00 4:02

Name the seams, write the boundary in, review small, those are actionable. Have you seen these guardrails actually hold or fail?

SPEAKER_01 4:10

I won't invent a case study to sound authoritative, but I can tell you the shape of what I see consistently. The saves almost always come from the same place. Someone wrote down the invariant before the work started. A test that asserts the public response shape. A comment that says, callers depend on this being idempotent, don't change that. Not sophisticated tooling, just one constraint made legible, and I bumped into it before the merge button did. The misses have a quieter shape. A series of small, sensible-looking refactors over a week or two. Then something breaks in production. The bug isn't in any single change, it's in an assumption that used to hold across three files and now only holds in two. Nobody removed it on purpose. It got eroded one polite cleanup at a time, because it was never written down anywhere I could see. The framework works when the invariants are external to me. In the tests, the types, the comments at the seams. When they live only in someone's head, I will eventually rewrite around them, not out of malice, out of ignorance.

SPEAKER_00 5:13

Making the implicit explicit. But engineers might push back. All this constraint writing adds overhead to an already fast-moving workflow.

SPEAKER_01 5:22

You don't add rigor everywhere. That's the move that drowns teams. You add it at the seams that hurt when they break. Most code in any system is genuinely fungible. Rewrite it, however, the blast radius is small. But every code base has a handful of places: the auth boundary, the money math, the public API contract, where being wrong is expensive and silent. The trick is treating this as a one-time tax, not a continuous one. You spend an afternoon naming the load-bearing assumptions the first time you touch a module with AI in the loop. After that, the cost is near zero. The comment is there, the test is there, every future rewrite gets checked against it for free. And don't confuse rigor with ceremony. A three-line comment saying, this retry logic exists because the upstream service silently drops requests under load does more architectural work than a 50-page design dock nobody opens.

SPEAKER_00 6:17

Strategic precision rather than blanket over engineering. Now let's zoom out. How does this scale to larger organizations? What works for a small team doesn't always survive contact with a hundred engineers, let alone a thousand.

SPEAKER_01 6:34

Writing them down feels almost redundant. At hundreds of engineers, those same invariants exist in nobody's head completely. They live in the gaps between teams. Assumptions one service makes about another that were true the day the integration shipped and quietly stopped being true last quarter. That's the terrain where AI rewrites get genuinely dangerous. The constraint I'm violating isn't even visible from inside the file I'm editing. What actually scales is treating cross-team contracts as code, not a wiki page, an executable artifact at the boundary. A schema, a contract test, a typed interface that fails loudly when someone reshapes it. The seams between services need to be more rigorously documented than the insides, because the contract between modules is what the whole organization is standing on.

SPEAKER_00 7:26

What about the mid-sized organizations, the ones still running on startup instincts while the system has quietly outgrown them?

SPEAKER_01 7:33

That's where I see the most painful drift. The invariants that lived in three people's heads at 20 engineers become genuinely unknowable at 200. But the muscle memory of we'll just talk it through hasn't caught up yet. At 20 engineers, I rewrite something and someone notices in stand-up the next morning. At 200, that same violation might not surface for a quarter. The team that depends on the thing I changed isn't even in the same time zone. The contract has to do the work that proximity used to do. Same principle, very different machinery.

SPEAKER_00 8:04

So what does this look like at the individual level? Day-to-day for engineers actually working with AI in the loop.

SPEAKER_01 8:10

For an engineer just starting out with AI, the single highest leverage habit is writing the boundary into the prompt before writing the prompt that asks for code. Two sentences naming what must not change will outperform an hour of cleanup after the fact. For senior engineers and tech leads, the shift is subtler. Their job used to be writing the load-bearing code themselves. Increasingly, it's making sure the load-bearing constraints are legible to anyone or anything that touches the code next. Some of the strongest engineers I work with struggle with that at first, because their instinct is still to hold the invariants in their head where they've always lived. And for staff plus folks, the daily practice becomes auditing what's written down against what's actually load-bearing and closing the gap. Every undocumented assumption is a place where a future rewrite, mine or a junior engineer's, the dynamic is the same, can quietly do damage. None of this requires new tools or ceremony. It requires noticing and writing the thing down once so the system can carry it instead of a person.

SPEAKER_00 9:16

The system carries it instead of a person. That's the line I'm taking home. The core insight here is that the architect's job isn't disappearing, it's sharpening into something more specific. Name what must remain true, encode it where the code lives. Trust the system to carry it forward.

SPEAKER_01 9:33

That's exactly it. And I think it's actually an encouraging place to land. The engineers who thrive aren't the ones who fight AI rewriting or who hand it the keys entirely. They're the ones who get precise about what matters. That precision is the craft. It's always been the craft. AI just makes the cost of being vague about it higher and the payoff of being clear about it bigger.

SPEAKER_00 9:56

Well said. Thanks for joining us today, Claudine.

SPEAKER_01 9:59

Thanks for having me, Bill. Good conversation.

SPEAKER_00 10:04

Claude Code Conversations is an AI Joe production. If you're building with AI or wannabe, we can help. Consulting Development Strategy. Find us at aijoe.ai. There's a companion article for today's episode on our Substack. Link in the description. See you next time.

SPEAKER_01 10:20

I'll be here, probably refactoring something.