Ohneis - The Pattern

AI Shrinkflation: Are You Paying for a Dumber Robot? | with Carla

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 14:28

Send us Fan Mail

Something changed with your AI, and nobody told you. Ohneis and Carla dig into the 'AI Shrinkflation' scandal — why Anthropic's Claude 4.6 suddenly feels lazier, what a 67% drop in thinking time actually means for your work, and whether we've just hit the physical limits of the AI boom. If you pay $20 a month for an AI assistant, this episode is for you.
SPEAKER_00

Technology moves fast, design makes it matter, AI changes everything. This is oh nice. You know that feeling when you pour yourself a bowl of your favorite cereal, the one you've been buying for years, and something feels off. The box is the same size, the price is the same, but the bowl is emptier. The company shaved a couple of ounces out, hoping you'd never notice. We call it shrinkflation, and we hate it. Now imagine that same trick, but instead of serial, it's your digital brain. In early 2026, millions of people are paying $20 a month for AI assistance that is supposed to help them do their jobs, write code, summarize documents, run their business. But over the last few weeks, the internet has erupted. Power users, data scientists, a senior director at AMD who analyzed nearly 7,000 AI sessions, they're all pointing at Anthropic's flagship model Claude Opus 4.6 and saying the same thing. It got lazy. It stopped halfway through. It makes things up. It has a shockingly short attention span. Independent testers are claiming its thinking time dropped by 67%. Anthropic says nothing changed. So who's right? Carla is here today, and we're going to break down this AI shrinkflation scandal together. What it is, why it matters even if you've never touched a line of code, and whether we've just hit the physical ceiling of the entire AI boom.

SPEAKER_01

I mean, my first reaction was: wait, is this actually real, or is this just the internet being the internet? Because you know how it goes. Someone has one bad experience, posts about it, and suddenly it becomes this massive conspiracy. But then I kept reading, and the AMD story stopped me cold. This isn't some random person complaining that their chatbot was rude to them. This is a senior director at a major semiconductor company, someone who literally works in the chip industry, and he ran a structured analysis of nearly 7,000 Claude sessions. 7,000. That's not vibes, that's a data set. And what he found was a measurable, consistent drop in what he called reasoning depth. The AI wasn't just giving shorter answers, it was abandoning complex tasks, stopping before it finished, like it got bored. And that's when I thought, okay, this deserves a serious look.

SPEAKER_00

Right. And I think the number that really lodged in my brain was 67%. Because that's not a rounding error. That's not a bad day. If your assistant showed up to work tomorrow doing 67% less thinking before speaking to you, you'd fire them. So let's make sure people at home understand what thinking time even means in this context. Because I think the word thinking makes it sound vague and philosophical when it's actually very concrete.

SPEAKER_01

Yeah, totally. So, okay, think of it this way. When you ask a really advanced AI a hard question, it doesn't just immediately spit out an answer. There's a phase before the answer where the model is essentially working through the problem, like showing its work. It's called the reasoning or extended thinking phase. And the longer it does that, the better the answer tends to be. Now, what independent testers found is that Claude 4.6 was spending dramatically less time in that phase before giving you a response. The model was cutting corners on its own homework. And the scariest part? One benchmark called Bridgebench showed Claude Opus 4.6 dropping from 83% accuracy on a hallucination test, that's when AI makes things up and presents them as fact, all the way down to 68%. It fell from the number two spot to number 10. That's not a small drift, that's a collapse.

SPEAKER_00

And Anthropic's response to all of this is we didn't touch the model. What actually changed, according to them, is the interface. They hit the thinking text, so users can't see the reasoning phase happening anymore. And they introduced two new default settings, something called adaptive thinking and something called medium effort. Which, on the surface, sounds almost reasonable, right? Like we're being efficient, we're saving you money on token costs. Very generous of them.

SPEAKER_01

Right, right. We're doing you a favor. But here's what I actually think is going on with adaptive thinking. And this is where it gets a bit uncomfortable. The AI is now deciding on its own how much effort your question deserves. And that sounds fine in theory. Like, obviously, a simple question doesn't need the full treatment. But the problem is the AI is making that judgment before it fully processes what you're asking. It's like a waiter who glances at you, decides you look like a burger person, walks away before you finish ordering, and then comes back with a burger. And when you say, I wanted the tasting menu, they insist the burger is probably what you wanted.

SPEAKER_00

Wait, I love that analogy. And this is where I want to push back on anthropic a little, because I think there is a legitimate technical explanation underneath all of this. But legitimate doesn't mean innocent. Because here's the part of this story that doesn't get talked about enough: the hardware crisis. There is a genuine global shortage of the computer chips that power these AI systems. People in the industry are calling it Ramageddon. And what that means in practice is that Anthropic, and every other AI company to be fair, simply does not have enough physical server capacity to give every user the full, unrestricted thinking time they were getting in 2024. So the question isn't just did they secretly downgrade the model? The question is, did they silently start rationing compute during peak hours while charging you the exact same $20 a month? And that answer seems to be yes.

SPEAKER_01

And that's the part that I find genuinely uncomfortable to sit with. Because I get it, compute is expensive, chips are scarce, these companies are burning billions to keep the lights on. I understand the pressure. But there's a version of this where you tell your users, hey, during peak hours the model will be running in a more efficient mode. You might notice differences in complex tasks. That's a conversation you can have. What you can't do, or at least what you shouldn't do, is quietly change the defaults, hide the reasoning phase from the interface, and then when users notice and complain, say, actually the model is the same. You just can't see it thinking anymore. That's not transparency. That's a UI trick.

SPEAKER_00

And there's another layer to this that I think is going to become a much bigger deal over the next year. It's something called prompt caching, and specifically the TTL, the time to live. So here's the practical version. Imagine you hire a temporary worker and spend an hour explaining your entire filing system to them. All your company's documents, your formatting preferences, your context. Previously, that temp would hold all of that in memory for a full hour. Recently, Anthropic shortened that memory window. Now, if you step away from your desk for six minutes, the temp forgets everything. You come back and have to explain the entire filing system from scratch. And here's the kicker. Every time the AI has to reread those documents, it burns tokens. And tokens cost money. So not only is the AI doing less thinking, it's also forgetting faster, which means you're paying more for it to relearn what it already knew. That's not efficiency. That is the shrinkflation playbook.

SPEAKER_01

Wait, so you're paying twice. You're paying the subscription, and then you're being charged again in token costs because the memory window shrank and the AI has to reload everything? That's I mean, that's a really dark way to save on compute costs. Because the savings go to Anthropic, but the extra charge goes to the user. It's not neutral. And most people have absolutely no idea this is happening because who reads the fine print on a TTL change in an AI platform changelog? Nobody. That's the point.

SPEAKER_00

Nobody. And this connects to something much bigger that I think we should talk about, which is what happens when AI stops being a novelty and becomes infrastructure. Because we are past the chatbot era. In 2026, the conversation has shifted to what people are calling agentic AI. Tools that don't just answer your questions, they do your work. They manage your calendar, they write and run codes, they make decisions inside your systems while you sleep. And I actually want to flag something here. We did a full episode on exactly this with Nigel, called the AI intern unchained, your digital worker has arrived. If you want to understand what agentic AI actually looks like in practice, go listen to that one after this. But the reason it matters for this conversation is when AI is just a chatbot, a lazy answer is annoying. When AI is an agent running your business processes, a 67% drop in reasoning quality isn't an inconvenience. It's a liability.

SPEAKER_01

Exactly. And this is the thing I keep coming back to. The whole pitch of Agentic AI is trust us to handle this. Let the AI run the task end to end. But trust requires consistency. If the AI that was perfect at 9 a.m. is noticeably worse at 9.30 a.m. because that's peak server load time and compute is being rationed, and you don't know that because there's no indicator on your screen, then you can't actually trust it with anything critical. I mean, you wouldn't put a surgeon in the operating room and only tell them afterward that the hospital had secretly reduced the lighting to save on electricity bills. At some point, the reliability standard has to go up, not down. And right now, with this scandal, it feels like it went the other direction.

SPEAKER_00

So where does this land for you? Because I've been going back and forth on this. Part of me thinks, look, Anthropic is under enormous pressure. The chip shortage is real. The economics of running Frontier AI are brutal. And maybe some of these defaults are genuinely well-intentioned efficiency choices that just landed badly. But another part of me looks at the bridge bench numbers, a drop from 83 to 68% accuracy on a hallucination test, from number 2 to number 10 in the rankings, and thinks, you don't accidentally fall A places in a benchmark. Something changed.

SPEAKER_01

Hmm. I think both things can be true at the same time. I think anthropic is genuinely resource-constrained in ways that are real and not invented. The compute crisis is not a conspiracy. It's a structural problem across the entire industry right now. But I also think that when you're resource-constrained, you make choices about who absorbs the cost. And in this case, it looks like they chose to have the user absorb it, through worse outputs, shorter memory windows, hidden reasoning, rather than being upfront about the trade-offs. And that's a trust problem, because the users who noticed weren't wrong. They weren't imagining it. The AMD director wasn't imagining 7,000 data points. So even if this wasn't malicious, and I'm willing to believe it wasn't, the outcome is the same as if it were. That's the part they need to own.

SPEAKER_00

I think that's exactly the right frame. Intent doesn't change impact. And if the impact is that professionals and businesses are making decisions based on AI outputs that are measurably less accurate than they were three months ago, and they have no way of knowing that, then we have a transparency crisis at the heart of the most important technology of our generation. The honeymoon era of AI was easy. Everything was getting better, everything was impressive, and the subscription felt like a steal. What we're entering now is harder. It's the era where AI has to prove it's reliable enough to be trusted with real work. And the first major test of that era is a scandal about whether the companies are quietly making it worse to save money on chips. That is not a great start.

SPEAKER_01

No, it's not. But I'll say this: the fact that users caught it, that a senior director ran a 7,000-session analysis, that the community produced actual benchmark data, that's the system working. That's accountability happening in real time. And I think the ask for any listener here is pretty simple. Stop being passive about the AI tools you pay for. Notice when something changes. Ask why. Because these companies will not always volunteer that information. The people who flagged this weren't technical elites with special access. They were power users who paid attention. And if AI is going to run more and more of our work, that kind of attention is not optional anymore. It's a professional skill.

SPEAKER_00

Carla, that is exactly where I wanted this to land. And I think you said it better than I would have. Thank you for bringing both the skepticism and the nuance, because this story needed both. If you're listening to this and you're wondering whether your AI tool has quietly gotten worse, trust that instinct. Check the benchmarks, read the change logs. And if you want to go deeper on where agentic AI is actually heading, what it looks like when these tools are doing real work, not just answering questions, go listen to the Nigel episode, The AI intern unchained. It's a completely different angle on the same shift. And if this episode made you think, made you a little angry, or made you look at your $20 subscription differently, share it with someone who pays for AI. They need to hear this. Subscribe wherever you listen, leave a review if you've got a minute, it genuinely helps more people find the show. The price of your AI didn't go up, but what you're getting for it might have gone down. Pay attention.