AI Mornings with Andreas Vig
Your daily AI news briefing in under 10 minutes. New models, product launches, research breakthroughs, and industry shifts, explained clearly, no hype.
AI Mornings with Andreas Vig
ARC-AGI-3 Interactive Benchmark & Google's "Pied Piper" Compression
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Hey, welcome to AI Mornings with Andreas Vig. It's Wednesday, March 26th. The ARC Prize team just launched what might be the most significant benchmark update we've seen this year. ARK AGI 3 is the first interactive reasoning benchmark designed specifically to measure human-like intelligence in AI agents. Unlike the static puzzles from previous ARC versions, this one drops agents into novel environments where they have to learn from experience, figure out goals on the fly, and adapt their strategies over time. A perfect score would mean an AI agent can beat every game as efficiently as humans. What's clever here is that it measures intelligence across time, not just final answers. It captures planning horizons, memory compression, and how well agents update their beliefs as new evidence appears. The benchmark includes replay visualization tools and a developer toolkit for anyone wanting to integrate their agent. This feels like a meaningful step toward actually measuring the gap between current AI and AGI. Google Research dropped something that's got the tech world making HBO references. They unveiled TurboQuant, a new compression algorithm that reduces LLM key-value cache memory by at least six times while delivering up to eight times speed up with zero accuracy loss. The internet immediately started calling it Pied Piper after the fictional compression startup from Silicon Valley. Cloudflare CEO Matthew Prince went so far as to call it Google's DeepSeek Moment. The technology uses a quantization method called PolarQuant combined with an optimization approach called QJL, and Google will present the findings at ICLR next month. But here's the important caveat. This is still a lab breakthrough. It hasn't been deployed broadly yet, and it only targets inference memory, not training. Still, if this works in production, it could meaningfully reduce the cost of running large models. Google also released Liria 3 Pro, a professional grade music generation model. The original Liria 3 from last month was limited to 30-second tracks. The Pro version now handles full 3-minute songs and gives users much better control over musical structure. You can specify intros, verses, choruses, and bridges, and the model actually understands how these pieces fit together. It's rolling out to Gemini paid subscribers and is also available through Vertex AI and the Gemini API for enterprise users. Every generated track gets watermarked with SynthID. This feels like Google getting more serious about creative AI tools beyond just text and images. On the policy front, Bernie Sanders and Alexandria Ocasio-Cortez introduced legislation that would ban new data center construction with peak power loads exceeding 20 megawatts until Congress enacts comprehensive AI regulation. The proposal calls for government review of AI models before release, protections against job displacement, environmental impact limits, and union labor requirements. They're citing concerns from tech leaders including Elon Musk, Demis Hassabis, Dario Amodei, and Sam Altman himself. But the bill faces significant obstacles, including massive political spending by AI companies and fears of losing ground to China in the AI race. GitHub made a quiet but notable change to its copilot data policy. Starting April 24th, interaction data from Copilot Free, Pro and Pro Plus users, including inputs, outputs, code snippets, and context, will be used to train AI models unless users opt out. Business and enterprise users are exempt. GitHub says incorporating Microsoft employee interaction data has already shown meaningful improvements in model acceptance rates. If you're on a free or personal plan and you're not comfortable with this, head to your privacy settings. A startup called Deccan AI just raised$25 million in Series A funding for post-training data and evaluation services. They're serving frontier labs like Google Deepmind and Snowflake, helping improve coding capabilities and agent interactions in models. The company operates with about 125 employees and a network of over 1 million contributors, primarily based in India. This is part of a growing trend. While the frontier model development happens in the US, much of the data labeling, evaluation, and reinforcement learning work is increasingly being outsourced. Meta continues what looks like an unofficial pivot to AI agents. They just acquired Dreamer, their third agent-focused deal in four months. CEO Mark Zuckerberg is reportedly building a personal AI agent to help run the company. Meanwhile, they've shut down Horizon Worlds after roughly$80 billion in cumulative reality labs losses since 2020, and they're pouring$135 billion into AI infrastructure. They haven't said the word pivot officially, but the direction is pretty clear. A couple of quick ones to wrap up. A startup called Moda raised$7.5 million for what they're calling a tasteful AI design agent that aims to solve the AI slot problem in marketing materials. Another startup called Link raised$27 million to build APIs for texting with AI through iMessage, SMS, and other messaging platforms. And a developer rebuilt Git in the Zig programming language to save AI coding agents 71% on tokens, calling it Knit. That's it for today. See you tomorrow.