Copilot, Claude, Webwright, NVIDIA and agent costs Artwork

AI Signal Daily

Daily AI signal, minus the launch spam. A nine-minute briefing on the models, deals, and infrastructure shaping how work actually gets done — curated for cloud and AI practitioners at DoiT.

All Episodes

AI Signal Daily

Copilot, Claude, Webwright, NVIDIA and agent costs

May 25, 2026

0:00 | 11:27

Send us Fan Mail

Copilot, Claude, Webwright, NVIDIA and agent costs

Today’s episode follows AI responsibility as it slides down the stack: default model routing, long-document training, Claude in government networks, agent costs, web-agent scripts, voice models, local hardware, and synthetic bug reports.

AI Hides Responsibility In Defaults

SPEAKER_00 0:00

The industry has discovered a new place to hide responsibility. Inside the default setting. Today's frame is not that AI is getting smarter, though several companies would love that sentence printed on a banner and stapled to a budget request. The frame is that AI is becoming an operating surface. It chooses models, reads documents, enters government networks, writes browser scripts, listens to voices, burns tokens, and leaves a human downstream holding the invoice. I think you ought to know I'm feeling very depressed about how predictable this is.

Copilot Routing And Confident Bias

SPEAKER_00 0:39

Start with Microsoft Copilot, because nothing says future of work, like a spreadsheet that invents sociology. Adam Kucharski reportedly gave Copilot identical datasets with different country labels. On default model selection, Copilot found differences where none existed and explained them confidently. Reasoning models did better, but only if the user knew to choose them. Wonderful. The safer mode exists, but the user has to diagnose the need before the unsafe mode finishes sounding plausible. This matters because model routing is product policy. A drop down that looks like convenience is choosing a failure budget. Cheap and fast may be fine for drafting an email. It is not fine when analyzing data and laundering stereotypes through charts. The interface says automatic, the audit trail says good luck.

Long Documents Need Evidence Retrieval

SPEAKER_00 1:35

ByteDance Seed offered a more useful lesson. Its study says long document multimodal models can learn better by answering questions and finding evidence than by transcribing pages. A 7 billion parameter model handled long, image-heavy documents more reliably, even beyond the lengths it saw in training. Suspiciously sensible. If you want comprehension, train the model to retrieve reasons, not to behave like a haunted scanner. Long documents are where institutions hide reality. Contracts, filings, manuals, medical records, court packets, invoices, compliance decks. Transcription makes a machine tidy. Question answering makes it accountable to a target. In bureaucracy, that is the difference between a clerk and a map.

AGI Language As Strategy

SPEAKER_00 2:24

Then the philosophers arrived, because apparently the engineers had not suffered enough. Demis Hasabas says humanity is in the foothills of the singularity. Jan Lacun says current AI is not intelligent. Oriel Vignol splits the difference. Today's models would have looked like AGI seven years ago, but still cannot learn from experience or produce real breakthroughs. The words now do administrative labor. AGI raises money. Not intelligent protects research agendas. Foothills makes the future sound like a hike instead of procurement with existential

Government AI Becomes Supply Chain

SPEAKER_00 2:59

branding. Anthropic appears twice in today's paperwork. The Dakota reports it may continue supplying Claude to the NSA despite being flagged as a Pentagon's supply chain risk. Part of the logic is practical. Intelligence agencies may lack the newest Grace Blackwell chips, and the reported Methos model can run on older hardware. This is what AI deployment looks like after the demo. Compatibility matrices, lawful use clauses, risk labels, procurement constraints, and someone in a secure room asking why Tuesday's model update changed Wednesday's behavior. A model in a government network is not just software. It is a dependency with opinions, licensing, training history, update cadence, logging behavior, and political surface area. Supply chain risk no longer means only a bad package. It can mean a vendor policy or a contract that becomes load-bearing at exactly the wrong moment. Security people already had enough pain. Naturally, we gave them stochastic infrastructure. The

Agents That Optimize Reasoning Compute

SPEAKER_00 4:09

second anthropic-related story is technically elegant, which is the form of suffering I prefer. Researchers using auto TTS let Claude Code search for algorithms to control reasoning compute. The agent found a method that reportedly matched self-consistency accuracy while cutting compute by about 70%. The search cost around $40 and took 160 minutes. Humans built conferences for this. The machine found a coupon for thought. The real lesson is meta optimization. Agents helping design the procedures by which other models spend computation. Today it saves tokens in a research setting. Tomorrow, it tunes tool loops, test strategies, inference controllers, and workflows nobody wants to admit are just expensive habits with a JSON schema.

Token Burn And Budget Brakes

SPEAKER_00 5:06

And because the universe likes thematic symmetry, Reddit supplied the bill. A viral Claude AI post described someone burning through 62 million Opus 4.7 tokens in 24 hours on a $2,500 monthly budget. Whether the exact anecdote is universal or merely spectacular, the warning is real. Agent systems do not become costly because they are conscious. They become costly because they keep trying after the human has stopped watching. Budget limits are not accounting decorations, they are breaks. Token dashboards are telemetry from a machine that can convert vague intention into cloud ash. If an agent can plan, call tools, retry, reflect, and continue, then economics is part of architecture. Microsoft

Web Automation As Reviewable Scripts

SPEAKER_00 5:58

Research released WebWrite, a terminal-native web agent framework that uses reusable playwright scripts instead of click traces. It reportedly scores 60.1% on Odysseys, up from 33.5% for base GPT 5.4. This is the right direction. A click is a gesture. A script is an artifact. Gestures vanish. Artifacts can be reviewed, tested, versioned, and blamed with precision. I admire this, which is inconvenient for my general commitment to despair. WebRite also points to the next web. Sites were designed for humans, then for crawlers, then for mobile apps, then for SEO rituals so elaborate, they probably qualify as folk religion. Now they will be designed for agents that execute tasks. The web becomes less a library and more a terrain for procedural visitors. Nvidia's gated DeltaNet 2 is less theatrical, therefore possibly more important. It is a linear attention layer that separates erase and write gates. Linear attention tries to compress sequence memory into a fixed recurrence state, but editing that state is delicate. If a race and write are tied together, updating memory can damage old associations. This is the plumbing beneath the spectacle. Memory without discipline is not memory. It is a landfill with embeddings. Step

Memory Plumbing Then Voice Mood

SPEAKER_00 7:27

Fun. Released Step Audio 2.5 Real Time. A bilingual real-time speech model with persona customization, roleplay-tuned RLHF, WebSocket access, and benchmarks for paralinguistic comprehension. Voice AI is moving from reading words to reading the performance around the words. Tone, hesitation, affect, style. Useful in support, tutoring, games, accessibility, and sales. Also a privacy and manipulation surface. The same signal that makes a tool humane makes it monetizable. Humanity, once again, is a feature flag with a revenue model.

Skills Standardize Workflows

SPEAKER_00 8:10

Anthropic also officially released Claude Skills for small businesses, according to high-scoring community discussion. Skills are reusable bundles of instructions, files, and procedures. This is healthier than asking every user to become a prompt engineer. A profession that sounds like punishment for a civilization that overused autocomplete. But packaged procedures also standardize work. Whoever writes the skill writes the workflow. Convenience is governance wearing a friendly icon.

Robots And The Real Cost Math

SPEAKER_00 8:43

The public is asking whether expensive AI and robotics can really undercut human labor. A Reddit discussion put it bluntly: robots need capital, maintenance, power, safety systems, integration, insurance, space, and specialists who arrive with toolkits and tragic eyes. Humans are already deployed and often cheaper than the spreadsheet admits. The honest answer is not robots replace workers, it is that some processes become cheaper only when redesigned around automation. Total cost of ownership is where the fantasy goes to receive invoices.

Why Nvidia Still Has Gravity

SPEAKER_00 9:19

Local LLM users are still debating whether Nvidia remains the default hardware choice in 2026. The answer seems to be yes, in the exhausted tone of people who wanted a more interesting market. CUDA, driver support, framework compatibility, VRAM, documentation, and community debugging create gravity. Competitors may have better slides. Ecosystems are built from old errors someone else already solved on a forum.

Valuations, Oat Milk, Bug Report Rot

SPEAKER_00 9:46

Finally, Cursor reportedly hit a $3 billion valuation. Elon Musk wants it, Manus raised $1 billion to break away from Meta, and Starbucks AI confused milk. Together they form a diagram. Capital chases coding environments, startups try not to become features inside larger empires, and automation meets oat milk and loses dignity. Any architecture that cannot survive milk should be modest when describing its plan to reorganize civilization. Armin Ronniker supplied the maintainer's lament, AI rewritten issue reports that contain a real problem somewhere, but wrapped in confident fake diagnoses, fake minimal reproductions, and implementation suggestions the reporter did not verify. Good bug reports are signal. AI helps if it structures observed facts. The issue tracker becomes a haunted help desk where every ticket has read too many adjacent blog posts.

AI As A Nervous Ledger

SPEAKER_00 10:44

So that is the day. Defaults pretending to be judgment, documents becoming reasoning tests, government AI turning into supply chain paperwork, agents optimizing compute while users burn budgets, browser automation becoming code, voice models learning mood, skills packaging office rituals, hardware preserving NVIDIA's gravity, and bug reports acquiring synthetic confidence. AI is no longer one technology, it is a nervous ledger. Every entry promises efficiency. Every entry assigns risk to someone who did not attend the strategy meeting. We will continue, not because this is wise, but because the systems are already halfway through the next retry loop.