The Glitchatorio
30-minute introductions to some of the trickiest issues around AI today, such as:
- The alignment problem
- Questions of LLM consciousness
- Chain-of-thought and monitorability
- Scheming and hallucinations
The Glitchatorio is a podcast about the aspects of AI that don't fit into standard narratives about superintelligence or technology-as-destiny. We look into the failure modes, emergent mysteries and unexpected behaviors of artificial intelligence that baffle even the experts. You'll hear from technical researchers, data scientists and machine learning experts, as well as psychologists, philosophers and others whose work intersects with AI.
Most Glitchatorio episodes follow the standard podcast interview format. Sometimes these episodes alternate with fictional audio skits or personal voice notes.
The voices, music and audio effects you hear on The Glitchatorio are all recorded or composed by the Witch of Glitch; they are not AI-generated.
The Glitchatorio
Saturated
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Benchmarks are the primary measure of AI capability. They involve testing the most advanced models and seeing what kinds of problems they can solve, or what kinds of human tasks they might be able to do.
And from late 2025 to mid-2026, most of the main benchmarks became saturated, meaning the models score so highly that the tests aren't meaningful anymore, both in terms of comparing different models' performance as well as their individual performance. That might suggest the models are just getting good at taking these tests. Or it might mean we're approaching the threshold of AGI.
In this episode, we'll hear from Håvard Ihle, who came up with his own benchmark called Weird ML to try to answer this question.
Note: Håvard's views are his own and do not represent the views of his employer the Norwegian Defence Research Establishment.
The METR time-horizon exponential graph is important context for this episode: https://metr.org/time-horizons/
Learn more about WeirdML:
- https://epoch.ai/benchmarks/weirdml
- https://www.lesswrong.com/posts/LfQCzph7rc2vxpweS/introducing-the-weirdml-benchmark
- https://www.lesswrong.com/posts/NLnGRDRXATW2pqXuE/is-the-gap-between-open-and-closed-models-growing-evidence
- https://www.lesswrong.com/posts/ifSBamvobbyB9KWjK/inference-costs-for-hard-coding-tasks-halve-roughly-every
- https://www.lesswrong.com/posts/hoQd3rE7WEaduBmMT/weirdml-time-horizons