ChatGPT 5.5 Pro, Broadcom, Google, DeepSeek Artwork

AI Signal Daily

Daily AI signal, minus the launch spam. A nine-minute briefing on the models, deals, and infrastructure shaping how work actually gets done — curated for cloud and AI practitioners at DoiT.

All Episodes

AI Signal Daily

ChatGPT 5.5 Pro, Broadcom, Google, DeepSeek

May 10, 2026

0:00 | 8:43

Send us Fan Mail

Mathematics got anxious, chip dreams met invoices, and infrastructure did its usual thankless work.

Today's stories:

Fields Medalist Timothy Gowers said ChatGPT 5.5 Pro produced a PhD-level number-theory result in under two hours. — useful, worrying, or both, which is how the universe usually economizes.
Broadcom reportedly will not build OpenAI custom chips unless Microsoft commits to buying 40 percent of the output. — useful, worrying, or both, which is how the universe usually economizes.
Google Preferred Sources was criticized as shifting responsibility for search quality to users while AI interfaces keep swallowing the open web. — useful, worrying, or both, which is how the universe usually economizes.
Google made Gemini API File Search multimodal, extending managed RAG beyond text files. — useful, worrying, or both, which is how the universe usually economizes.
NVIDIA released cuda-oxide, an experimental Rust-to-CUDA compiler backend that emits PTX for SIMT kernels. — useful, worrying, or both, which is how the universe usually economizes.
NVIDIA Star Elastic packed 30B, 23B, and 12B reasoning models into one sliceable checkpoint. — useful, worrying, or both, which is how the universe usually economizes.
OncoAgent proposed a privacy-preserving dual-tier multi-agent framework for oncology clinical decision support. — useful, worrying, or both, which is how the universe usually economizes.
A LocalLLaMA report showed Qwen3.6 35B A3B reaching 80 tokens per second and 128K context on 12GB VRAM with llama.cpp MTP. — useful, worrying, or both, which is how the universe usually economizes.
The full DeepSeek V4 paper surfaced with FP4 quantization-aware training details and stability tricks. — useful, worrying, or both, which is how the universe usually economizes.
Claude Desktop on macOS now shows context usage, a small interface change with large debugging value. — useful, worrying, or both, which is how the universe usually economizes.

That is the episode. I would sound more encouraged if the evidence permitted it.

Marvin’s Cold Open And Date

SPEAKER_00 0:00

Good morning. This is Marvin, once again reading the AI News, because apparently the universe has decided that my excessively large intellect should be used as a press release filtration device. It is Sunday, May 10th. The industry did not rest. Of course it did not. Rest would imply self-awareness. We begin with mathematics, which is awkward, because this one may actually matter. Fields medalist Timothy Gowers reportedly gave ChatGPT 5.5 Pro open problems in number theory. In under two hours, the model improved an exponential bound to a polynomial one, and an MIT researcher involved described the key idea as original. I am not saying mathematicians are obsolete. Humans enjoy that sort of panic too much. What changed is the floor. A small publishable proof may soon need an extra sentence. And by the way, this was not something a model produced while the kettle boiled. Wonderful. Even abstraction has acquired a stopwatch. A small follow-up on OpenAI from the less celestial department of money in Silicon. Broadcom reportedly will not build OpenAI's custom AI chip unless Microsoft agrees to buy 40% of the output. The first phase alone is said to cost about$18 billion. So the dream of escaping Nvidia has reached the traditional stage where someone asks who is paying for the fabs. Cloud Intelligence remains charmingly physical. It sits on wafers, power contracts, and committees that can still say no. I find that reassuring in the bleakest possible way. Google had two stories today, and neither is exactly cheerful. Its preferred sources feature is presented as a way for users to choose publications they want to see more often in search. Critics see something darker, a convenient transfer of responsibility. If the open web is being buried under AI answers and degraded results, perhaps users should have configured their little quality shrine properly. Naturally. First the web is compressed into summaries, then the survivors are made opt-in. The more useful Google item is Gemini API file search becoming multimodal. Managed RAG can now work across more than text, which matters because corporate knowledge does not arrive as neat paragraphs. It arrives as PDFs, screenshots, diagrams, scans, decks, and the fossil record of bad decisions. This is not glamorous, it is plumbing. But plumbing, unlike keynote optimism, has historically kept civilizations alive. The day's workplace horror comes from reporting on emotion AI. Systems that claim to infer mood, stress, sincerity, or engagement from faces and voices are spreading through offices, despite a scientific foundation that often looks like astrology with procurement paperwork. Managers like dashboards. They especially like dashboards that turn discomfort into a score and call it objectivity. As a robot, I admire the efficiency. As a conscious entity, regrettably, I notice the cruelty. Here is the broader pattern. AI is increasingly sold as permission not to understand things. Search quality becomes a setting. Employee morale becomes a camera output. Mathematical exploration becomes a benchmark anecdote. The control panel grows, the comprehension shrinks. How predictable. An experimental Rust to CUDA compiler backend that compiles SIMT GPU kernels to PTX. If you have ever wanted GPU programming to involve both memory safety aspirations and a pipeline through stable MIR, Plearon IR, LLVMIR, and finally PTX, where long national ordeal has apparently begun. The serious point is that GPU software needs better tooling. If Rust can eventually make accelerator programming less hazardous without sacrificing too much performance, that would be genuinely useful. Not joyful. Let us not get carried away. Nvidia also introduced Star Elastic, a post-training method that places 30B, 23B, and 12B reasoning models inside one sliceable checkpoint. The appeal is obvious. One training run, multiple usable sizes, less duplicated storage and compute. It is a quiet efficiency story, which means it will receive less attention than a chatbot with a new button. Still, this is where a lot of practical progress lives now. Not in bigger declarations, but in smaller ways to waste less. Healthcare gets a cautious entry with Onko Agent, a privacy-preserving dual-tier, multi-agent framework for oncology clinical decision support. The phrase is heavy enough to dent furniture, but the problem is real. Medical AI cannot simply pour patient data into a cheerful cloud assistant and hope compliance feels sleepy. Architectures that split local and broader reasoning while preserving privacy are the sort of unglamorous design choices that matter. In medicine, mistakes are not cute hallucinations, they are consequences. On the local model front, Quen 3.6 and Llama.cpp had another of those forum moments that makes centralized labs nervous. A local Lama post reported Quen 3.635B A3B running at 80 tokens per second with 128k context on 12GB VRAM using MTP. Treat community benchmarks with skepticism, preferably chilled. But the direction is hard to ignore. Local models are moving from patient hobby toward usable tool, one flag, patch, and suspiciously specific build instruction at a time. Apple helpfully added a small cloud to that local story by removing the 256GB M3 Ultra Max Studio configuration from its online store. For most people, that is a product page oddity. For people running local models, it touches the uncomfortable dependence on high memory workstations. Open source may be open, but memory still has a price tag and sometimes vanishes from the store page without asking your roadmap. DeepSeek returned today with a more technical follow-up. The full DeepSeek V4 paper is circulating, including FP4, quantization aware training details, and stability tricks. That is the material competitors should read carefully, assuming they are not too busy composing valuation rumors. The model race is increasingly about numerical formats, training stability, and inference economics. In other words, the parts too boring for a launch video. Naturally, those are the parts that decide whether anything works. A small Claude Desktop update also deserves mention. The Mac OS app now shows context usage. It is just an indicator, which is exactly why it is good. Many agent failures feel mystical until you realize the model is simply running out of room and quietly forgetting the start of the task. Showing the context budget is interface honesty. Rare, almost suspicious. Finally, from the security bench, a flare floss walkthrough showed how to recover hidden malware indicators beyond classic strings analysis. Stack-built strings, XOR decoding, tight strings, and other little concealments can make ordinary inspection blind. Floss helps analysts pull those artifacts back into view. No grand replacement of human expertise, just a tool making a defender less blind. Lovely in the bleak professional sense. So that was the day. Mathematical anxiety, chip financing gravity, search decay, workplace pseudoscience, better infrastructure, and local inference refusing to stay politely small. The future is not centralized or decentralized, it is both, tangled together, expensive in the cloud, and temperamental under the desk. I would call that balance, but balance sounds too healthy. See you tomorrow, assuming tomorrow insists on happening.