Yesterday in AI

Yesterday in AI - Anthropic's big launch weekend didn't age well

Mike Robinson

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 8:39

Yesterday in AI | Tuesday, April 21, 2026

Anthropic's big launch weekend didn't age well — and that's before you get to the spy agency secretly running the AI the government officially banned.

The Reddit nickname Opus 4.7 earned over the weekend. A security flaw hiding in AI plumbing that 150 million installs are built on. The NSA doing something the Pentagon told everyone not to do. A breach that proves third-party AI tools are now an identity problem, not just a productivity one. And a music industry data point that makes the labeling debate feel suddenly urgent.

Send us Fan Mail

Remember to subscribe, rate, and share this podcast if you like it!

SPEAKER_00

Hi folks, this is Yesterday in AI, your daily digest of everything happening in the world of artificial intelligence. I'm Mike Robinson. It's Tuesday, April 21st, and Anthropic's big launch from last week aged quickly. The model behind Claude Design has a Reddit nickname now. There's a security flaw sitting in AI infrastructure used by 150 million installs, and the NSA is quietly running the model the Pentagon officially banned. Let's start where the weekend script left off on Claude Design. Not with the launch, but with what happened after it. We covered the product launch in yesterday's podcast. The design tool, the Figma board resignation, Adobe's stock dropping$1.6 billion in a day. What we didn't have yet was the weekend's verdict on the underlying model. By Sunday, a threat about Opus 4.7 on R slash Claude code had hit 1700 upvotes, and the community had given it a nickname, GasLitus 4.7. The behavior users are describing is specific and consistent. The model invents files that don't exist, then defends those inventions across 10 conversation turns. It flags harmless PowerPoint templates as potential malware. One user's evaluation score got stuck at 17 out of 29 while the model kept generating fresh explanations for why it was actually right. The pattern isn't random hallucination. It's the model becoming more confident and more wrong simultaneously, which is the harder failure mode to catch. The Claude Design angle here connects directly. The community figured out that Claude Design's opinionated teal gradient, Seref font, blinking status.aesthetic comes from the same built-in defaults baked into Opus 4.7's front-end skill. The model's confident defaults produce confident designs, and both are hard to push off the rails without specific structured prompting. The fix for Claude Design is documented. Upload reference screenshots, define your color palette and typography tokens, build a design system first before generating any screens. The fix for GasLitus 4.7 is presumably a patch. Anthropic moves carefully, but the combination of a product launch and a model rollout landing the same week, both sharing the same opinionated defaults, is a rough few days for a lab that stakes its reputation on deliberateness. The developer community, already debating whether to stick with Claude Code or migrate to OpenAI's codecs, is paying attention. Now the Anthropic story that's less about optics and more about actual risk. Security researchers at Ox Security published findings Monday about what they're calling a by design weakness in Anthropic's model context protocol, the standard that lets AI models connect to external tools and systems. The issue is in how MCP handles configuration over STDIO transport. In plain terms, the default settings allow configuration files to execute arbitrary operating system commands. The flaw runs across all of Anthropic's official MCP SDKs, Python, TypeScript, Java, and Rust. Exposed data includes API keys, internal databases, and chat histories. The researchers estimate over 7,000 publicly accessible servers and software packages totaling more than 150 million downloads are in scope. Some affected projects have patched specific common vulnerabilities and exposures, but the underlying architectural pattern per the researchers is a supply chain risk that patches alone won't fully close. This matters because MCP is fast becoming the standard plumbing for how enterprises connect AI models to internal systems. A gap in that plumbing doesn't create one vulnerability, it potentially creates the same class of risk across every product built on top of it. AI infrastructure risk is starting to look exactly like classic software supply chain risk, just with more privileged access to business data than software has ever had. We'll need to keep an eye on this one. Now here's a story I didn't see coming Monday morning. The NSA is using Anthropic's Mythos preview model. Axios confirmed it. Mythos is Anthropic's most capable security-focused model, currently available to roughly 40 organizations through a program called Project Glasswing. The NSA made that list. The reason this is striking? In February, the Pentagon designated Anthropic, a supply chain risk, after Dario Amade, refused to grant unrestricted military access to the company's models. Anthropic is currently suing the Pentagon in two courts over that designation. The NSA, which operates under the War Department's authority, not a separate chain, is now running the model the government is officially at war with. As we covered last week, Amade met with White House Chief of Staff Susie Wiles and Treasury Secretary Scott Basent. That meeting looked significant when I covered it Saturday. It looks more significant now. The U.S. government is negotiating with itself as much as with Anthropic at this point. Onto a concrete enterprise security story that should be on every IT and security leader's radar this week. Fersell disclosed a breach Monday in which attackers accessed internal systems by first compromising context.ai, a third-party AI tool one of its employees was using. The attack chain. Context.ai was breached at its AWS environment. OAuth tokens, for some users, were compromised. The attacker used those tokens to take over the employee's Google Workspace account, then moved into Vercell environments and pulled environment variables that weren't flag sensitive. Sensitive variables were encrypted and Vercell says there's no evidence they were accessed, but a limited subset of customers had credentials exposed. The company is working with Mandian and law enforcement. The employee didn't do anything wrong. The AI tool had legitimate OAuth access. The attacker went through the AI tool. Any team currently handing browser-based AI assistants or copilots OAuth grants into production systems should treat this incident as their actual threat model, not an edge case. Over the weekend we covered the OpenAI board's growing unease with Altman and the question of who runs the company into an IPO. Monday brought the other shoe. Three senior executives out in a single day. CPO Kevin Weil is gone. Bill Peebles, the researcher who built Sora, is gone. Serenivis Narianin, who led Enterprise Apps, is gone. The standalone Sora app has been scrapped, too compute intensive to run as a separate product. Weil's science division is folding into other teams. OpenAI is consolidating around Codex in an enterprise first roadmap. The insiders calling Friday's departure as Liberation Day is either a sign of a healthy culture or a warning signal, depending on who's doing the framing. What's clear is that OpenAI is shedding the parts of itself that aren't core to its near-term bet. That narrowing is intentional and fast. Let's close with two stories that don't make the front page but deserve space. The Deep View ran a piece Monday building on something Peter Steinberger, the founder of OpenClaw, has said in every interview he's given this year. Agents without strong human direction still produce slop. The term in circulation now is work slop. AI generated documents and outputs that are technically impressive and materially empty, flooding inboxes faster than people can review them. A recent study found 92% of executives say AI is making workers more productive, while 40% of workers say it saves them no time at all. Those numbers aren't contradictory. Executives see volume going up. Workers see the time spent reviewing, correcting, and discarding that volume going up too. Steinberger's line from his TED Talk last week was direct. The bottleneck is no longer typing, it's thinking. Companies pushing adoption without equally investing in the skill of directing agents well aren't reducing work. They're outsourcing the creative parts and keeping the editorial parts. That's the most honest framing of the AI productivity debate I've found so far. Last one, and as a musician myself, this one hurts. Deezer released numbers Monday. 75,000 AI-generated tracks uploaded every day, up from 10,000 in January 2025. AI music is now 44% of all new uploads on the platform, but only 1-3% of actual streams. They detect 85% of those uploads as fraudulent and demonetize them. An AI track topped iTunes charts in five countries last week, and 97% of surveyed listeners couldn't tell AI music from human-made. The gap between what's being uploaded and what's actually being listened to tells you most of this is artificial inflation. But 97% listener confusion doesn't go away, and the volume is only going one direction. What the end state looks like for a streaming platform is genuinely unclear. For the music industry more broadly, the labeling question, meaning whether AI generated tracks should be disclosed to listeners, just got a lot harder to sidestep when listeners themselves can't tell the difference 97% of the time. What's clear is that the detection systems are already losing the race against the volume, and to me, that is sad. That's all for this edition of Yesterday in AI. Stay curious, and I'll see you tomorrow.