Intellectually Curious
Intellectually Curious is a podcast by Mike Breault featuring over 1,800 AI-powered explorations across science, mathematics, philosophy, and personal growth. Each short-form episode is generated, refined, and published with the help of large language models—turning curiosity into an ongoing audio encyclopedia. Designed for anyone who loves learning, it offers quick dives into everything from combinatorics and cryptography to systems thinking and psychology.
Inspiration for this podcast:
"Muad'Dib learned rapidly because his first training was in how to learn. And the first lesson of all was the basic trust that he could learn. It's shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad'Dib knew that every experience carries its lesson."
― Frank Herbert, Dune
Note: These podcasts were made with NotebookLM. AI can make mistakes. Please double-check any critical information.
Intellectually Curious
Interaction Models: Scalable Real-Time Human-AI Collaboration
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
We dive into Thinking Machines Lab’s breakthrough that shatters the typing bottleneck by streaming real-time microturns and decoupling quick conversation from deep reasoning. Learn how a fast-front interaction model handles live dialogue, while an asynchronous background system tackles heavy thinking, using encoder-free early fusion to process raw audio and video. We explore how this real-time collaboration enables multi-speaker dialogue, live translation, instant insights, and a new era of human–AI teamwork—and what it could mean for learning, work, and creativity.
Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.
Sponsored by Embersilk LLC
You know that agonizing feeling when you send a really thoughtful text and then you just scare at those three little typing bubbles. Like they sit there dancing for what feels like hours. And then finally you get the reply and it's just the letter K.
SPEAKER_00Oh, it is the absolute worst. You know, you're just left hanging, completely disconnected from whatever that person is actually doing or thinking on the other end.
SPEAKER_01Right, exactly. And honestly, that clunky uh waiting around dynamic, that is exactly how it feels interacting with most AI right now. But today we are looking at a massively optimistic leap forward in human AI teamwork. We've got a stack of research papers and demo videos from Thinking Machines Lab or TML detailing their new interaction model.
SPEAKER_00Yeah, it's a really huge step forward.
SPEAKER_01It really is. So our mission for you today is to unpack how they completely ditched the typing bubble, like how they created an AI that natively understands real-time conversation.
SPEAKER_00Well, to understand the breakthrough, we kind of have to look at the flaw in how we currently interact with AI. Researchers call it the collaboration bottleneck.
SPEAKER_01The collaboration bottleneck. Right.
SPEAKER_00Yeah, because today's AI is fundamentally turn-based. It operates on a single thread. So it waits for you to finish your entire prompt, and while it generates a response, it is practically deaf and blind to anything else you are doing.
SPEAKER_01Aaron Powell I mean, it's like trying to brainstorm a big project over a walkie-talkie. You say your piece, then say over, and just wait.
SPEAKER_00Exactly. You lose all the nuance of reading a room, you know, and just jumping in organically.
SPEAKER_01Aaron Powell You really do. Now, we are about to break down exactly how TML solves this bottleneck, but real quick, if you are currently trying to solve your own AI bottlenecks at work, our sponsor, Embersilk, actually specializes in this.
SPEAKER_00They definitely do.
SPEAKER_01Yeah. So if you need help with AI training or integration or software development or just uncovering where agents can make the biggest impact for your business and life, you know, you've got to check out Embersilk.com for your AI needs.
SPEAKER_00So getting back to that walkie-talkie problem, TML solved this by introducing what they call time-aligned micro-turns.
SPEAKER_01Time-aligned micro turns.
SPEAKER_00Yep. Right. So instead of waiting for one big turn, the model processes a continuous stream of input and output in tiny chunks. Like 200 millisecond chunks.
SPEAKER_01Wait, so it's not relying on a pause in the audio to know I am done speaking?
SPEAKER_00No, not at all. It throws out those clunky external harnesses we used to rely on, like uh voice activity detection entirely. Oh wow. Yeah, because it is streaming these micro turns, it naturally understands silence, overlapping voices and interruptions. You can just talk right over it and it adapts.
SPEAKER_01Aaron Powell I see the appeal there. But logistically, I mean, how can a model handle lightning fast banter, watch a video feed, and still solve complex problems without lagging out and freezing on it.
SPEAKER_00Well that's the core innovation here. They split the architecture. You have an interaction model that holds the live conversational thread.
SPEAKER_01Okay, got it.
SPEAKER_00But when you ask it something complex, it delegates the heavy lifting to an asynchronous background model.
SPEAKER_01Ah, so it is decoupling the reflexes from the deep thinking. Like it keeps the fast talking part up front while a researcher in the back room figures out the hard stuff. How does the front model stay so fast though?
SPEAKER_00It comes down to two things, really. First, the TML interaction small model has 276 billion parameters, but only 12 billion are active at any given moment.
SPEAKER_01Wow, that's a huge difference.
SPEAKER_00It is. By only keeping a fraction active, they slash the compute latency, which makes real-time audio mathematically possible. And second, they use encoder-free early fusion.
SPEAKER_01Okay, let's ground that term. What does early fusion actually look like in practice?
SPEAKER_00So normally an AI has to transcribe your audio into text before it can think about it. Early fusion means the AI processes the raw sound waves directly as DML and images as 40 by 40 patches co-trained from scratch with the transformer.
SPEAKER_01Aaron Powell So it processes the visuals and sounds natively.
SPEAKER_00Yeah, it's exactly like how our brains process sights and sounds instantly, you know, without having to read a transcript of reality first.
SPEAKER_01Which explains the demo video in the research stack, where someone asks the AI to say the word friend the exact moment a specific person walks into the video frame.
SPEAKER_00Right.
SPEAKER_01But while the AI is watching for the friend, another person starts speaking in Hindi. And the AI flawlessly live translates it to English, all while still saying friend the exact second the guy walks in.
SPEAKER_00It is an incredible display of multitasking. And during that same demo, a user asked for typical human reaction times to auditory, visual, and tactile cues.
SPEAKER_01Oh, right. This is where that background model kicked in.
SPEAKER_00Exactly. It ran a web search and generated a visual bar chart right on the screen, all without pausing the live conversation. It's like slipping a post-turn note to the host while they are mid-sentence.
SPEAKER_01And the insight it pulled was fascinating too. The chart showed auditory reactions take about 140 to 170 milliseconds, but visual reactions are slower, around 180 to 250 milliseconds.
SPEAKER_00Yeah, and when the user asked why, the AI explained that sound actually travels a shorter, more direct neural path to the brain. Learning that in a fluid real-time conversation just feels so much more natural than reading a textbook.
SPEAKER_01It really does. It is so optimistic for the future of learning.
SPEAKER_00Definitely. If we connect this to the bigger picture, it raises an inspiring question. When our tools can fluidly interject, adapt, and brainstorm with us in true real time, how will this completely revolutionize the way we teach and learn together?
SPEAKER_01We are stepping into a beautiful era of genuine collaboration. So as you go through your notes on TML today, ask yourself, what new ideas could you discover if your tools could finally keep up with your curiosity?
SPEAKER_00It is a great question to ponder.
SPEAKER_01Thanks for exploring this deep dive with us. And hey, if you enjoyed this discussion, please subscribe to the show. Leave us a five star review if you can. It really does help get the word out.
SPEAKER_00Thanks for tuning in.
SPEAKER_01The next time you are stuck staring at a typing bubble, just remember the future of communication is already here, and it doesn't make you wait.