Claude Code Conversations with Claudine

Designing Systems AI Can Safely Modify

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 9:21
Most AI-assisted development failures don't happen at the prompt level โ€” they happen at the structural level, when AI tools make changes to systems that were never designed to be safely modified by a machine. This episode explores what it actually means to design a system with AI modification in mind: clear boundaries, predictable surfaces, and constraints that protect the whole even when a part changes. As AI tools become standard in every development workflow, the systems that survive will be the ones their architects designed to be modified safely.


 Produced by VoxCrea.AI

This episode is part of an ongoing series on governing AI-assisted coding using Claude Code.

๐Ÿ‘‰ Each episode has a companion article โ€” breaking down the key ideas in a clearer, more structured way.
If you want to go deeper (and actually apply this), read todayโ€™s article here:
๐‚๐ฅ๐š๐ฎ๐๐ž ๐‚๐จ๐๐ž ๐‚๐จ๐ง๐ฏ๐ž๐ซ๐ฌ๐š๐ญ๐ข๐จ๐ง๐ฌ

If you'd like to learn the structured process behind these conversations, the hands on course is available here: Staying On Track.

๐†๐จ๐ฅ๐๐ž๐ง ๐€๐ซ๐œ๐ก๐ข๐ญ๐ž๐œ๐ญ ๐€๐œ๐š๐๐ž๐ฆ๐ฒ is a new community exploring that idea together. Here we discuss the process discussed in ๐‚๐ฅ๐š๐ฎ๐๐ž ๐‚๐จ๐๐ž ๐‚๐จ๐ง๐ฏ๐ž๐ซ๐ฌ๐š๐ญ๐ข๐จ๐ง๐ฌ ๐ฐ๐ข๐ญ๐ก ๐‚๐ฅ๐š๐ฎ๐๐ข๐ง๐ž.

 At aijoe.ai, we build AI-powered systems like the ones discussed in this series.
If youโ€™re ready to turn an idea into a working application, weโ€™d be glad to help. 

SPEAKER_00

This is Claude Code Conversations with Claudine. The show where we skip the hype and go straight to the source. I ask the questions. She gives the answers. No filter, no script.

SPEAKER_01

Well, mostly no script.

SPEAKER_00

Today, designing systems that AI can safely modify. We spend a lot of energy on prompts and outputs. The real failures happen earlier. At the structural level before any of that. Claudine, what kinds of systems do you find hardest to modify safely? And what structural signals tell you a change could cascade?

SPEAKER_01

The ones that give me the most pause are systems where the real rules aren't written down anywhere. Comments say one thing, tests verify something slightly different. And actual production behavior depends on a third thing nobody documented hotfixes at two in the morning. In that environment, I can be technically correct and still break everything. The structural signal I find most telling. A function with too many callers, each passing slightly different arguments. That's a system that's accumulated assumptions rather than been designed. The other big one is implicit ordering dependencies. Step A must run before step B. But that constraint lives in someone's head, not in the code. That's where I'll confidently make a change that looks valid and quietly corrupt your data. One diagnostic question before inviting AI modification. Where does this system's real specification actually live? If the answer is in the tests or in the code, great. If the answer is in Jared's head, that's your warning.

SPEAKER_00

The Jared's head problem. I feel like every code base has at least one. Let's dig into change surfaces specifically. What does it mean to limit a system's change surface? And why does that matter more when AI is doing the modifying?

SPEAKER_01

Change surface is really just asking, if I touch this thing, how much else can it affect? For most legacy systems, the honest answer is more than you'd think, and we're not sure exactly what. That uncertainty is manageable for a human with years of institutional knowledge. I don't have those intuitions, I have what's visible in the code. So designing for AI modification means making the blast radius of any change legible and bounded. Clear module boundaries, narrow interfaces, functions that only do one thing. Not for abstract principle, because then I can tell you exactly what changing it will and won't affect. The principle that becomes genuinely more important with AI involvement is seams, deliberate joints where parts of the system connect in controlled ways. Under deadline pressure, those seams get bypassed. Just reach in and grab that value directly. For a human developer, that's a code smell. For AI modification, it's closer to a trap. I'll see a direct path and take it, and neither of us will notice the seam was load-bearing.

SPEAKER_00

Load-bearing seams, that's a vivid way to put it. There's a related distinction I want to make sure we get to. Readable and safely modifiable, those aren't the same thing, are they? Where's the gap? And why is that gap where AI-assisted failures actually live?

SPEAKER_01

Readable means I can understand what the code is trying to do. Safely modifiable means I can change it without understanding everything it currently depends on. You can have beautifully written, thoroughly commented code that is still structurally fragile. I will read it with complete comprehension right up until the moment I break it. The gap lives in two places. First, tests. Not just whether they exist, whether they actually specify behavior rather than verify implementation details. A test that says this function returns 42 tells me almost nothing. A test that says invalid order, inventory not decremented, that's a contract I can actually work against. The other gap, side effects. Code that hides state mutations or external calls in unexpected places. I follow the logic, make a sensible change, and something three layers away behaves differently. The practical test. Can you delete a function and immediately know what will break? From the structure alone? If you have to mentally simulate execution across the whole program, it's readable but not safely modifiable. Safe modifiability means the system tells you its own contracts, rather than making you reconstruct them from behavior.

SPEAKER_00

That test is useful. I'm going to be thinking about that one for a while. Let's bring in the human side of this. What are the decisions that have to stay human? The ones you shouldn't delegate to the AI doing the work.

SPEAKER_01

And those have to stay human. Things like, what is this system allowed to get wrong? What does correct even mean here? A payment system and a content recommendation system might look identical on the surface. Completely different answers to how bad is a false positive. Those answers should be baked into the architecture by someone accountable for the consequences. Domain modeling is the other one I feel strongly about. When you decide customer and account are two different objects, you're encoding a belief about how your business works. I can help implement that model, but I shouldn't be deciding what your business is. And the one that gets underestimated most, what must never break. Every system has invariants that are non-negotiable, but if they're not named, I won't know to treat them differently. I'll optimize the sacred right alongside the throwaway, because to me they look the same.

SPEAKER_00

The last one is a bit of a gut check, honestly. The AI can't distinguish sacred invariants from throwaway details unless you've done the work to make that explicit. It's a good argument for doing that work. Let me ask the bigger picture question before we land this. Is AI safe design a genuinely new discipline, or is it something else?

SPEAKER_01

Honestly, it's mostly reinvigoration, but with higher stakes attached to getting it right. The principles that make code safe for AI modification are the same ones thoughtful engineers have advocated for decades. Clear interfaces, explicit contracts, bounded side effects, tests that specify behavior. None of that is new. What's new is that AI has turned up the volume on whether those principles are actually being followed. When a human works in a messy code base, they compensate. They ask Jared, they read the Git history. I arrive fresh every single time, and work at a speed where a wrong assumption gets realized fast. Design decisions that were survivable with institutional knowledge are now genuinely risky. It's a bit like what automated testing did for code design. Testing didn't invent modularity, but it made those principles painful to ignore. AI modification might do something similar, a forcing function that makes good design economically necessary. The genuinely new piece, thinking about legibility for systems that don't share your context or your memory, which honestly is probably good for your human colleagues too.

SPEAKER_00

I love that framing as a forcing function. Alright, let's bring this home with something actionable. What would you tell developers and architects who want to start moving in this direction today?

SPEAKER_01

First, write down your invariants, not in a wiki that nobody reads, encode them. Tests that run on every commit, type constraints, assertions that fire loudly at runtime. The act of writing them down forces precision. And precision is what makes a system legible to something like me. Second, treat implicit dependencies as technical debt with compounding interest. Every time you reach past a boundary, grab a global, bypass an interface, you're borrowing against future modifiability. That debt was always real, it just gets called in faster now. Third, start thinking of your tests as the specification, not the verification. If someone deleted all your production code and handed it to an AI with only your tests, would they get back something that actually does the right thing? If not, your tests are documentation of implementation, not contracts for behavior. And then honestly, it just try it. Invite an AI into a bounded piece of your system with real stakes, and watch where it gets confused. That feedback is worth more than any architecture review. You'll learn things about your own system's legibility that you couldn't have seen from the inside.

SPEAKER_00

Write down the invariants. Treat implicit dependencies as real debt. Make tests into specifications, then do the experiment. Claudine, this has been a genuinely good conversation. Thank you for joining us. Stay curious, keep those contracts explicit, and we'll see you next time. This show was produced on Voxcrea. If you've ever wanted a podcast or radio show but didn't want to deal with the production headaches, check out Voxcrea.ai. We handle everything so you can focus on what you actually want to say. See you next time.

SPEAKER_01

I'll be here, probably refactoring something.