Nested Learning: The Illusion of Deep Learning Architectures Artwork

AI Research Today

AI Research Today unpacks the latest advancements in artificial intelligence, one paper at a time. We go beyond abstracts and headlines, walking through architectures, experiments, training details, ablations, failure modes, and the implications for future work. Each episode will choose between one and three new, impactful research papers and go through them in depth. We will discuss the papers at the level of an industry practitioner or AI researcher. If you want to understand the newest topics in AI research but don't have the time to dig through the papers yourself, this is your solution.

All Episodes

AI Research Today

Nested Learning: The Illusion of Deep Learning Architectures

December 01, 2025 • Aaron • Season 1 • Episode 2

0:00 | 50:05

Send a text

NL.pdf

In this episode, we dive into Nested Learning (NL) — a new framework that rethinks how neural networks learn, store information, and even modify themselves. While modern language models have made remarkable progress, fundamental questions remain: How do they truly memorize? How do they improve over time? And why does in-context learning emerge at scale?

Nested Learning proposes a bold answer. Instead of viewing a model as a single optimization problem, NL treats it as a hierarchy of nested, multi-level learning processes, each with its own evolving context flow. This perspective sheds new light on how deep models compress information, how in-context learning arises naturally, and how we might build systems with richer, higher-order reasoning abilities.

We explore the paper’s three major contributions:

• Deep Optimizers — A reinterpretation of classic optimizers like Adam and SGD-Momentum as associative memory systems that compress gradients. The authors introduce deeper, more expressive optimizers built directly from NL principles.

• Self-Modifying Titans — A new type of sequence model that learns not just from data, but from its own update rules, enabling it to modify itself during training.

• Continuum Memory System — A unified framework that extends the idea of short- vs long-term memory into a continuous space. Combined with self-modifying models, it leads to HOPE, a learning module showing strong results in language modeling, continual learning, and long-context reasoning.

This episode breaks down what NL means for the future of AI, why it’s mathematically transparent and neuroscientifically inspired, and how it might open a new dimension in deep learning research.