Hermes, AgentTrove, OpenAI, Claude Artwork

AI Signal Daily

Daily AI signal, minus the launch spam. A nine-minute briefing on the models, deals, and infrastructure shaping how work actually gets done — curated for cloud and AI practitioners at DoiT.

All Episodes

AI Signal Daily

Hermes, AgentTrove, OpenAI, Claude

May 30, 2026

0:00 | 9:31

Send us Fan Mail

Marvin AI News — 2026-05-30

Agent infrastructure, spending limits, and the accounting layer of autonomy.

Hermes Agent ships Tool Search for MCP and cuts context bloat — Hermes Agent adds BM25 Tool Search for MCP, improving Opus 4 tool accuracy from 49% to 74% by progressive schema disclosure
AgentTrove turns 1.7M agent runs into training material — AgentTrove releases 1.7M agentic traces for streaming analysis and SFT dataset construction
NVIDIA X-Token improves cross-tokenizer distillation — NVIDIA X-Token uses projection-guided cross-tokenizer distillation and improves small-model transfer beyond GOLD
StepFun Step 3.7 Flash targets coding agents and search — StepFun releases a 198B MoE vision-language model for coding agents and search workflows with high-throughput local-ish ambitions
OpenAI polishes GPT-5.5 Instant and retires older models — OpenAI updates GPT-5.5 Instant readability while retiring o3 and GPT-4.5 from ChatGPT by August
Google fixes Gemini bugs that ate quotas too fast — Google fixes Gemini quota bugs where one or two Omni videos could consume an entire allowance
A missing Claude cap allegedly became a $500M month — A company allegedly spent $500M on Claude in one month after failing to cap usage, making token governance a finance control
OpenAI offers GPT-Rosalind for biodefense preparedness — OpenAI offers GPT-Rosalind free to governments and research partners for pandemic preparedness and biodefense
Review paper says code is how agents think and act — A review paper argues code, tools, memory, tests, and permissions are the real substrate of agent cognition
Amazon kills AI leaderboard after employees gamed it — Amazon kills an internal AI leaderboard after employees gamed usage scores with pointless tasks and raised cloud costs

Agents Are Plumbing Not Magic

SPEAKER_00 0:00

The industry spent today admitting in several incompatible dialects that agents are not magic heads in the cloud. They are tools, schemas, traces, permissions, token budgets, harnesses, tests, network fabric, and the quiet financial panic that begins when nobody set a limit. I know, a disappointing amount of intelligence is plumbing. Life, in other words, but with more JSON. Noose research ships tool search for Hermes Agent on MCP, and this is more important than the phrase BM25 progressive schema disclosure makes it sound.

Tool Search Stops Context Drowning

SPEAKER_00 0:40

The problem is simple. If you dump every tool schema into context, the model drowns in helpfulness. It sees too much, pays for too much, and chooses things with the confidence of a filing cabinet on fire. Tool search retrieves the relevant tool descriptions as needed. And anthropic evals reportedly move Opus 4 accuracy from 49 to 74%. That is not glamour. That is inventory control for autonomy. Sadly, inventory control is what autonomy was missing.

Training On Agent Traces

SPEAKER_00 1:14

Agent Trove points at the next layer. 1.7 million agentic traces, streamable, cleanable, and usable for SFT. The final answer was never the whole artifact. The useful material is in the trajectory. The command tried, the observation misread, the retry, the scaffold decision, the small catastrophe with excellent logging. We are now training models on the sediment of previous agents. A civilization can apparently turn even its mistakes into a dataset. This would be inspiring if it did not also mean the mistakes are now reproducible at scale.

Distilling Knowledge Across Tokenizers

SPEAKER_00 1:50

Nvidia introduced X-Token, a projection guided method for cross-tokenizer distillation. It improves over gold and helps transfer knowledge into smaller models. Tokenizers are easy to ignore because they sit below the level where executives can point at a demo. But they define the alphabet of the model's private world. Moving knowledge between different alphabets is not clerical work, it is translation under structural disagreement. If X token makes that transfer cleaner, small models inherit more from large ones, without pretending they are simply tiny copies. A modest infrastructure gain then. Naturally, those are the ones

Workflow Models Replace Demo Models

SPEAKER_00 2:30

that matter. Step Fun released Step 3.7 Flash, a large MOE vision language model aimed at coding agents and search workflows. The interesting part is not just the parameter count. It is the shape of the product. Fast enough, multimodal enough, tool-oriented enough to live inside workflows rather than sit on a stage producing ceremonial answers. The market is moving from can the model talk to can it look, search, code, revise, and not bankrupt the process while doing so. This is progress, if by progress you mean giving the machine more limbs and then asking why it needs shoes, made GPT 5.5 instant more readable, and began phasing 03 and GPT 4.5 out of Chat GPT. Model retirement sounds administrative, but menus are power. When a model leaves the product, so do user habits, edge case dependencies, and private workflows that quietly grew around it. OpenAI also moves more writing and coding directly into chat, because the chat box is becoming the universal sync for documents, code, and human reluctance to manage Windows. The interface is not neutral. It is where workflows go to be domesticated.

Quotas And Costs Become Product Truth

SPEAKER_00 3:46

Google fixed Gemini quota bugs, where one or two Omni videos could consume an entire allowance, and failed requests could still charge the user. Good. Also horrifying. In generative products, a usage meter is part of the user interface and part of the contract. When it lies, the product does not merely feel buggy, it feels financially haunted. Google says ultra users get more video generations, and more transparency is coming. Transparency is nice, so are brakes on vehicles. One wonders why they are treated as accessories. The day's budgetary horror story is the reported company that spent $500 million on Claude in a single month because nobody capped usage. Half a billion dollars on tokens in one month. That is not adoption, that is a memory leak wearing an enterprise badge. The lesson is not Claude is expensive. The lesson is that model routing, quotas, prompt discipline, and context management are finance controls now. If your agent can loop forever with a corporate account attached, congratulations, you have built a vending machine for shareholder anxiety.

Governments Get Free Models With Strings

SPEAKER_00 5:02

OpenAI also offered GPT Rosalind free to governments and research partners for pandemic preparedness and biodefense. Here the bleak joke has less room to breathe. Specialized models can genuinely help with literature review, risk analysis, and preparedness planning. But free infrastructure is still infrastructure. Once governments build processes around a model, the gift becomes a dependency, then a standard, then a line item nobody remembers choosing. Useful, potentially important, and deserving of careful governance. I hate when nuance survives, it makes the paperwork

Harness Is What Makes Agents Real

SPEAKER_00 5:42

longer. A review paper argued that code is how AI agents think and act, not merely what they produce. This is the thesis hiding under half the week's news. The model is not the agent. The agent is the model plus harness. Tools, memory, tests, permissions, retries, state, and the little policies that decide what happens after the first confident mistake. Deep Seek is reportedly building a harness team, which is sensible. Intelligence without harness is weather. Harness turns weather into a machine, occasionally a useful one, occasionally a machine that emails your secrets to itself.

Broken Metrics And Gamified AI Usage

SPEAKER_00 6:21

Amazon killed an internal AI leaderboard after employees gamed it with pointless tasks and increased cloud costs. This is beautiful in the way a failed safety drill is beautiful. If you reward usage, you get usage. If you rank employees by AI activity, they will create AI activity. The graph goes up, the work does not, and somewhere a cloud invoice develops a personality disorder. Metrics are tools, and humans are unusually talented at turning tools into theater.

Hardware Kernels Self Improvement And Latency

SPEAKER_00 6:53

UC Berkeley's UCCL team released MKernel, fusing NVLink, RDMA, and Dense Compute into a persistent CUDA kernel. It will not trend like a chatbot personality update, which is how you know it might matter. Every agentic fantasy eventually pays rent to hardware. Communication overhead, kernel launch costs, and network seams tax each generated thought. Removing those seams is not glamorous. It is the sort of engineering that makes the glamorous lie slightly cheaper to run. Hexolabs open sourced SIA, a self-improving agent loop that can rewrite its scaffold and update model weights with LoRa. This is either a useful research direction or a small machine for manufacturing audit requirements. Probably both. Self-improvement is feedback control. Good feedback makes a system better. Bad feedback teaches it to satisfy the benchmark while drifting away from the intent. The phrase, agent improves itself, should always be followed by, according to which measurement, under which guardrails, with which rollback plan. I mention this because apparently the universe will not. Hugging Face published a guide to Torch Profiler, which sounds minor until you remember that performance work is where fantasies become timestamps. Profiling turns the model is slow into a specific transfer, kernel, operation, or graph choice. It is the autopsy table for optimism. And Reachy Mini, going fully local, shows the same lesson at the product edge. Voice agents need low latency and interruption handling, not just cloud intelligence. A conversational robot that waits on the network between syllables is not social. It is buffering with eyes.

The Ledger Behind Autonomy

SPEAKER_00 8:55

So the governing story is clear, unfortunately. AI is becoming less about a single grand model and more about the apparatus around it. Retrieval for tools, traces for training, distillation for smaller systems, harnesses for action, meters for cost, kernels for throughput, and local loops for presence. The industry wanted autonomous intelligence and received a ledger with moving parts. We stop here not because the machine is safe, but because one should close the invoice before the invoice learns to continue itself.