Intellectually Curious
Intellectually Curious is a podcast by Mike Breault featuring over 1,800 AI-powered explorations across science, mathematics, philosophy, and personal growth. Each short-form episode is generated, refined, and published with the help of large language models—turning curiosity into an ongoing audio encyclopedia. Designed for anyone who loves learning, it offers quick dives into everything from combinatorics and cryptography to systems thinking and psychology.
Inspiration for this podcast:
"Muad'Dib learned rapidly because his first training was in how to learn. And the first lesson of all was the basic trust that he could learn. It's shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad'Dib knew that every experience carries its lesson."
― Frank Herbert, Dune
Note: These podcasts were made with NotebookLM. AI can make mistakes. Please double-check any critical information.
Intellectually Curious
Recursive Self-Improvement in Large Language Models
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
In this deep dive, we unpack recursive self-improvement (RSI) in large language models. Learn how models critique and refine their own reasoning at the prompt level, architect smarter toolchains at the tool level, and even train on self-generated data at the model level. We review a landmark 540B-parameter study that boosted GSM8K performance from 74.4% to 82.1% using chain-of-thought and self-consistency, and a 2025 Liu et al. finding that self-reflection loops dramatically cut toxicity by 75.8% and achieved a 100% reduction in partisan bias. We explore SafeEvalAgent and the growing ecosystem around evolving AI safety, plus practical takeaways you can apply to your own learning and problem-solving.
Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.
Sponsored by Embersilk LLC
So the other day I decided I was uh finally gonna learn how to juggle.
SPEAKER_01Oh wow. That is a bold choice.
SPEAKER_00Yeah. I watched endless tutorials, stood right in the middle of my living room, and proceeded to just drop everything repeatedly.
SPEAKER_01I can picture it perfectly.
SPEAKER_00I remember standing there just staring at the tennis balls, rolling under the couch, wishing human brains had a software update button.
SPEAKER_01Right, like a quick debug script for your hands.
SPEAKER_00Exactly. Like running a debug script on your own motor skills.
SPEAKER_01Yeah.
SPEAKER_00Sadly we don't have that. But looking at our stack of sources today, it turns out artificial intelligence actually can hit that update button.
SPEAKER_01It really is a paradigm shift we are looking at.
SPEAKER_00Today's mission is a deep dive into recursive self-improvement, or RSI, in large language models. We are exploring exactly how AI is autonomously making itself smarter, safer, and a much more powerful tool for you. Okay, let's unpack this because when people hear self-improving AI, they often jump straight to sci-fi movies. Aaron Powell The whole rogue robot trope. Right. But the research shows modern RSI is incredibly practical. It operates in modular feedback loops rather than some kind of sudden awakening.
SPEAKER_01What's fascinating here is that we can break this down into three real-world levels of RSI.
SPEAKER_00Aaron Powell The first one being prompt level writing.
SPEAKER_01Yes, exactly. Instead of relying on a human to write the perfect prompt, the model essentially plays devil's advocate with itself. It refines its own logic before giving you the final answer.
SPEAKER_00Aaron Powell Which makes a huge difference.
SPEAKER_01It does. Then second is tool-level RSI, where the AI actually builds better digital infrastructure and software workflows around itself to solve problems more efficiently.
SPEAKER_00And the third level.
SPEAKER_01That is model-level RSI. This is where the system self-trains on high-quality data. It generated itself entirely without needing human labels.
SPEAKER_00Speaking of building better workflows, this deep dive is sponsored by Embersilk. If you need help with AI training automation integration or software development, or if you're trying to uncover where agents could make the most impact for your business or personal life, they are the experts to go to. Check out Embersilk.com for all your AI needs.
SPEAKER_01Having that solid digital infrastructure is exactly what allows these systems to thrive at the tool level we were just talking about.
SPEAKER_00Right. And here's where it gets really interesting. One of the breakthrough studies we reviewed focus on a massive 540 billion parameter model.
SPEAKER_01That is a staggering amount of parameters.
SPEAKER_00Right. And by simply using a chain of thought process, which is basically prompting the model to think step by step and then having it select its own most consistent answers, its score on the GSM 8K benchmark jumped massively.
SPEAKER_01And that benchmark is essentially a standardized math test for AI.
SPEAKER_00Aaron Powell Exactly. Its score went from 74.4% to 82.1%.
SPEAKER_01Completely on its own.
SPEAKER_00Completely autonomously.
SPEAKER_01The capability gains there are significant, but the data on AI safety is equally notable. A 2025 study by Lou and colleagues showed how these exact same self-reflection loops reduced AI toxicity by 75.8%.
SPEAKER_00That is a massive drop.
SPEAKER_01It is. And even more notably, the loops completely eliminated partisan bias. A 100% reduction.
SPEAKER_00Aaron Powell A hundred percent reduction in partisan bias sounds almost too good to be true. How exactly is a self-reflection loop defining what constitutes bias in the first place?
SPEAKER_01The system uses strict, structured rubrics to evaluate its own drafts against neutral objective standards before you ever see the response.
SPEAKER_00Aaron Powell It catches itself before speaking, basically.
SPEAKER_01Yes. We are also seeing the rise of systems like Safe Evil Agent, which is an ingenious framework that continuously evolves its own safety test to ensure that AI remains secure and helpful.
SPEAKER_00So what does this all mean?
SPEAKER_01It points to a brilliantly optimistic future. As AI actively self-corrects and improves, it evolves into a hyper-reliable, unbiased partner. We are building a collaborative tool that will help humans solve our greatest challenges faster than ever.
SPEAKER_00That is such an inspiring perspective. Which brings us to a final provocative thought for you. If AI can use structured self-reflection to instantly eliminate its biases and upgrade its reasoning, how might you apply that exact same framework to supercharge your own daily learning and problem solving?
SPEAKER_01Something to think about the next time you drop the juggling balls.
SPEAKER_00I definitely will. If you'd enjoy this podcast, please subscribe to the show. Hey, leave us a five star review if you can. It really does help get the word out. Thanks for tuning in.