Heliox: Where Evidence Meets Empathy 🇨🇦‬

🧠 The Gentle Art of Taming Chaos: What Neural Networks Teach Us About Living With Turbulence

• by SC Zoomers • Season 6 • Episode 20

Send us a text

📖 Read

We've been thinking about chaos all wrong.

For years, the prevailing wisdom in both neuroscience and life has been essentially the same: eliminate the noise, suppress the turbulence, force order onto disorder. Whether we're talking about neural networks learning to coordinate movement or humans trying to navigate an increasingly complex world, the assumption has been that chaos is the enemy—something to be conquered, controlled, stamped out.

But what if chaos isn't the problem? What if it's actually the raw material of intelligence itself?

This isn't just philosophical speculation. New research in computational neuroscience is revealing something profound about how biological systems learn, and the implications extend far beyond the laboratory. The framework is called "predictive alignment," and it suggests that the brain doesn't brutally force its internal turbulence into submission. Instead, it does something far more elegant: it learns to guide that chaos gently toward coherent, purposeful behaviour.

The difference matters more than you might think.

Taming the chaos gently: a predictive alignment learning rule in recurrent neural networks


This is Heliox: Where Evidence Meets Empathy

Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter.  Breathe Easy, we go deep and lightly surface the big ideas.

Thanks for listening today!

Four recurring narratives underlie every episode: boundary dissolution, adaptive complexity, embodied knowledge, and quantum-like uncertainty. These aren’t just philosophical musings but frameworks for understanding our modern world. 

We hope you continue exploring our other podcasts, responding to the content, and checking out our related articles on the Heliox Podcast on Substack

Support the show

About SCZoomers:

https://www.facebook.com/groups/1632045180447285
https://x.com/SCZoomers
https://mstdn.ca/@SCZoomers
https://bsky.app/profile/safety.bsky.app


Spoken word, short and sweet, with rhythm and a catchy beat.
http://tinyurl.com/stonefolksongs

Curated, independent, moderated, timely, deep, gentle, evidenced-based, clinical & community information regarding COVID-19. Since 2017, it has focused on Covid since Feb 2020, with Multiple Stores per day, hence a large searchable base of stories to date. More than 4000 stories on COVID-19 alone. Hundreds of stories on Climate Change.

Zoomers of the Sunshine Coast is a news organization with the advantages of deeply rooted connections within our local community, combined with a provincial, national and global following and exposure. In written form, audio, and video, we provide evidence-based and referenced stories interspersed with curated commentary, satire and humour. We reference where our stories come from and who wrote, published, and even inspired them. Using a social media platform means we have a much higher degree of interaction with our readers than conventional media and provides a significant amplification effect, positively. We expect the same courtesy of other media referencing our stories.


Welcome back to the Deep Dive. Our mission here is pretty simple. Yeah. We take complex, cutting-edge research, we strip away all the academic jargon, and we really just hand you the shortcuts you need to understand the absolute frontiers of science. And today, we are going deep, deep into the wiring of the brain. We are. Specifically, how we learn these incredibly complex, time-based sequences. And in doing that, we're wrestling with a pretty big concept, internal chaos. That's right. I mean, that's really the core of it. Yeah. Just think for a second about the incredible complexity of our own cognition. You're not just remembering static facts like a phone number. You're constantly acquiring and recalling these intricate sequences. It could be the order of words that makes up a sentence you're about to say. Or the muscle movements for playing an instrument or even just catching a ball. Exactly. A highly skilled motor behavior. And that entire ability, it's all built on continuous, really coordinated neural activity inside the brain. And when we try to model these things, you know, computationally with tools like recurrent neural networks or RNNs, we hit a wall. A pretty fundamental problem. You do. RNNs are powerful because they have these strong internal feedback loops. And output from one neuron can feed back and influence itself later on. But those loops, if you just leave them unchecked, they lead to something researchers call chaotic spontaneous activity. Right. And chaos sounds bad. It sounds like a bug, not a feature. It sounds like pure static, just white noise. But computationally, it's, well, it's kind of necessary evil, or maybe not even evil. It's potential. It's like an engine that's running a little too hot, but that heat gives you power. The chaos provides what they call a rich variety of basis functions. What does that mean in simple terms? If you imagine the network's internal state as this huge multidimensional room, Chaos ensures that the network is constantly exploring every single corner of that room. It provides a huge palette of different patterns to draw from. So the complexity is actually good for representation. It gives you a lot to work with. A lot to work with. But that's not enough, right? The real challenge is, Taking that turbulent, high-dimensional, seemingly random chaos... And turning it into something coherent, like the smooth motion of your arm reaching out to grab a cup of coffee. That leap from internal turbulence to external precision. That is the big mystery we're trying to solve. And, you know, historically, we have had some powerful tools. The celebrated force learning rule, for instance, was a landmark achievement. It was incredibly successful at training these chaotic RNNs to do amazing things. But, and here's the big but where our deep dive really begins. Force has this critical fundamental flaw when you look at it through the lens of biology. when you ask the question, how does the actual brain do it? That's the million-dollar question. And the main issue is just biological implausibility. Forced learning requires a couple of major things that, well, synapses in the brain just don't seem to do. Okay, what's the first one? First, it requires non-local plasticity rules. That means for one synapse to change, it needs information about the network's final output error, which, well, it's just not available right there at that specific point. synapse. It needs to know what's happening miles away in computational terms. Precisely. And the second thing, which is maybe even more critical, is that it requires extremely quick synaptic changes. Faster than the neurons themselves are firing. Faster than the network's own dynamics. It's a brilliant computational trick, but it just doesn't map onto how real kind of sluggish local brain synapses actually adapt and change over time. It's too fast. It needs too much information from everywhere at once. It's just it's too demanding. So if force isn't the way the brain does it, what's the first step to finding a replacement? Do you change the network itself or do you change what you're trying to achieve, the goal? You have to change the goal, but you use a structure that kind of leans into how real neurons are wired. And that's why our mission for this deep dive is to unpack this totally novel framework. It's a revolutionary, biologically plausible alternative called Predictive Alignment. Predictive Alignment. Developed by Toshitake Oksabuki and Claudia Klopath and published just recently. It's designed specifically to overcome those limitations. It promises a local online learning rule that can, as you put it, tame chaos gently. Okay. Okay. Tame chaos gently. I like that. Okay. Let's unpack this approach, starting with the basic structure of the network. We're looking at an RNN and the internal wiring, this thing called the recurrent connectivity matrix. We'll call it Duralalers. It's explicitly split into two different parts. Yes. And this separation is absolutely foundational to understanding how predictive alignment works. so the total connectivity Duller dollars is actually the sum of two different matrices two and dollars think of them as two separate sets of wires in the brain exactly two layers of wire harnesses let's talk about dollars first okay what's dollar represents the fixed or static connections these are strong they're sparse and this is key they don't change during learning their whole job is to generate that initial rich chaotic activity Deller is the wild untamed architecture of the system so you're building a house delia dollar is the fundamental structure the beams the pipes the things that create this big open maybe even turbulent space that's a perfect analogy and if delia dollars is the fixed architecture than dollars is the flexible part right the plastic component exactly dollars is initially weak it's fully connected and it's where this new learning rule is actually applied so if dollars is the architecture Dele is like the interior design team that comes in later and installs these gentle adaptive partitions to guide the flow of traffic and the whole goal of training dollars is just to suppress or tame the chaos that Dele is creating yeah to carve out functional paths and through that wild landscape.- Precisely. And this gets us right to the big philosophical shift, the core innovation of predictive alignment. Standard learning, like the Delta Rule, is all about minimizing the error at the very end, at the output stage.- It's like a teacher saying,"Your answer is wrong by five points. Fix everything now."- Right. And that's actually how the readout weights, the dollar matrix, are trained in this model. It's the classic approach, just minimizing the difference between the target and all ballers and the output, seely focused on that external error. But predictive alignment says, hold on, we are not applying that same logic to the internal recurrent connections to not all ballers. Instead of chasing the output error, the learning for Deliwell focuses on two completely different internal goals. Prediction and alignment. And this is such a subtle but absolutely monumental difference. So let's contrast it again with force, just to make it clear. You said force clamps the output. It does. Force learning effectively forces the network activity to stay right on the desired path during learning, almost instantaneously. an external forceful correction. Which is why it needs that impossible speed and non-local information. Exactly. Predictive alignment just avoids that clamping completely. The network is actually allowed to be wildly wrong at the beginning. The paper explicitly notes that early in learning, the output is a total mismatch from the target. It's not being forced into place. No. It's a much more gentle internal negotiation with the chaos, not an external command. Okay, so if the network's internal wiring, LG, isn't getting that direct moment-to-moment error signal, how on earth does it figure out what to do? It sounds like it's trying to teach itself without the answer key. It is, in a way. It's teaching itself by developing an internal model of its own output. It focuses the learning in knowledge on predicting an internal feedback signal. and aligning its own predictive dynamics with the chaos that's already there. Okay, we've set up the philosophy. Gentle guidance, not forceful clamping. But how do you actually translate that philosophy into math? Let's dissect the engine, starting with the two rules that are running at the same time. Right. So first you have the readout rule. That's for the output weights, dollar-staller. It's a standard delta rule, minimizing the final error. We don't need to get bogged down there. It's pretty conventional. That's the external part. The real magic is in the recurrent rule, which they call LEAB. It takes the output, six doll, and just feeds it back, but only for the purposes of learning. So the plastic weights are trying to change themselves to predict what the network's own output would look like after being filtered through this random matrix dollars. Random. Why not just feed back the actual target signal? Because making Q to random is what makes it biologically plausible and local. It means the system doesn't need to do complex, non-local calculations like backpropagation. instead each neuron just gets this local signal cues goa and tries to predict it it's a clever shortcut and this is a huge point that feedback signal cues ago it only appears in the learning rule in the role it doesn't actually change what the neurons are doing in real time exactly the network isn't being corrected or clamped as it runs it's just It's sort of building an internal predictive model on the side offline. So Nowars is like a student who's not looking at the teacher's answer key. They're generating their own answers and then comparing them to a scrambled internal version of the guide. That's a perfect way to put it. It uses its current best guess of the output to guide its internal wiring without needing the teacher, the external target looking over its shoulder every single second. Okay, that covers prediction. What about the second part of the cost function, the one that gives the framework its name? The alignment term. Right, the alignment or regularization term. This part is governed by a parameter, alpha, which is usually set to one, showing how important it is. Its job is to promote the correlation between the plastic prediction, one more hour, and the existing chaotic dynamics. Z.R. Dahl. Wait, so Nader is trying to predict the output feedback, and it's also trying to make sure that its own activity is aligned with the underlying chaos from C.L.L.R.S. Why would you want to correlate with the chaos? Wouldn't you want to get rid of it? You want to harness it, not destroy it. The analysis in the paper showed this alignment is the actual mechanism for taming the chaos emissionly. If you turn that term off, if you set alpha to zero, the network really struggles to learn. The correlation doesn't grow. But when you turn it on, when alphas is one, that correlation grows. And what does that do to the system's dynamics? That growth is what produces the taming effect. By aligning the learned dynamics of dollars with the chaotic dynamics of dollars, the system effectively pushes the network's main Lyapunov exponent further and further into the negative range. And the Lepunov exponent, that's the classic measure of chaos, right? Positive means it's chaotic. Tiny errors blow up. Precisely. So by making the exponent more negative, the predictive alignment rule is verifiably making the system more stable. It's suppressing the chaos. You're not killing the energy of the system. You're just channeling it, making sure it goes into a coherent forward motion instead of exploding everywhere. This connects directly to a really famous idea in this field, that networks perform best when they're sitting right on the edge of chaos. And this research confirmed it beautifully. The best performance, which they measured as both the lowest output error and the smallest possible weight strength for dollar, happened when the initial network, driven only by dollar, was tuned to be just above the critical point. Right on that edge. And keeping the weights small is huge for stability, right? Big weights can make a system really fragile. So performance and robustness both peak at this boundary. Why? What's so special about the edge of chaos? It's the perfect balance between two things the network needs, representational diversity and effective dimensionality. Okay, let's unpack those. Diversity was measured by something called the entropy of eigenvalues. HEMDA. You can just think of this as the sheer number of different thoughts or patterns the network is capable of having. High diversity is good. And the second one? Effective dimensionality measured by the participation ratio or $8. This is about how many of those different thoughts can be actively coordinated at the same time to produce a useful structured output. it's about maintaining coherence so you need both you need a lot of rich ideas but you also need to be able to organize them exactly if the network is too stable too far from chaos its dimensionality is low it can't learn anything complex it's too rigid but if it's too chaotic then the dimensionality is high but the information gets gets scrambled, it loses all meaningful structure, it's just noise. The edge of chaos is that perfect Goldilocks zone where you maximize both the richness of your representations and your ability to structure them. That's where learning is most efficient. And predictive alignment is a rule that's specifically designed to exploit that sweet spot. That's the idea. We've covered the internal mechanics. Now let's see this thing in action. The researchers put it through his paces, right? Starting with some simple toy examples. They did, and it showed immediate success across the board. It could learn simple things like sine waves, of course. But what was impressive was that it could learn them across huge time scales from very short periods to very, very long ones. And it also handled patterns that are usually tricky for these kinds of continuous systems. Yes, things with sharp breaks, like discontinuous step functions or non-smooth sawtooth waves. For a system based on smooth flows, making a sudden step requires a very rapid, very controlled shift in the network state, which it managed reliably. And it could do more than one thing at a time. Absolutely. They showed it could learn five different tasks at once. They had one network with 800 neurons drive five separate readouts, and it generated five completely distinct patterns simultaneously, all sharing the same underlying engine. Okay, that covers the basics. The real test for these networks is always how they handle true complexity. So let's move from simple waves to something much harder. Low dimensional chaos. Specifically, the Lorenza tractor. Right, the Lorenza tractor. This is the classic textbook example of complex, non-periodic, chaotic dynamics. It's that three-dimensional butterfly-shaped trajectory that never, ever repeats itself exactly. And the network learned it. It did. Using only three readout units, it successfully learned to generate the target. Late in learning, the output dynamics were a near-perfect match. It was drawing the butterfly. But here's the really critical experiment. They trained it, and then they turned the plasticity off and just let it run on its own. What happened then? Well, as you'd expect for any chaotic system, the network's trajectory eventually diverged from the exact target path. Any tiny little error from noise or whatever grows exponentially over time and pushes it off course. But, and this is the big takeaway, right? Even though it diverged, the output didn't just become random noise. It kept producing complex, non-periodic oscillations that were still strikingly similar to the Lorenza tractor. Yes, this is the key. It means the model didn't just memorize the specific path. It learned the underlying manifold. It learned the rules of the system, the grammar of the dynamics that define the attractor itself. It learned the shape of the butterfly, not just one specific flight path. Why is that important for us? Because so many of our own high-level motor skills are like this. Juggling, improvising on a piano, they are fundamentally chaotic processes. They have rules and structure, but the exact movement never repeats. This suggests predicted alignment is learning the grammar of movement, which allows for structured improvisation, not just rote memory. That's a perfect lead-in to the next test, which is a classic for biological memory, the Ready, Set, Go task. The RSG task is really demanding. It's all about active memory. The network gets two quick pulses, ready and set, separated by a certain delay, let's say a total delay. And the network's job is to measure that delay. Measure it, hold it in memory for a bit, and then after a final Go signal, it has to reproduce that exact same delay in its output. And the delays they used, up to 160 milliseconds, were way longer than the neuron's own time constant of 10 milliseconds. That's a crucial detail. It is. A 10 meters time constant means any signal should fade out really fast. For the network to hold on to that time measurement, it has to be using its internal recurrent connections to actively sustain that information. It can't just be passive decay. So it trained on a few examples. What was the key finding about how it generalized to new delays? The generalization was really interesting. It was robust, but it was bounded, so it could perfectly reproduce the delays it was trained on. And crucially, it could interpolate really well to delays that fell between the trained examples. So if you train it on 100 meters and 140 meters and then test it on 120 meters, it works. It works perfectly. But, and this is just as informative, it failed completely at extrapolation. If you asked for 200 meters outside the training range, it fell apart. Completely. And that specific pattern good interpolation bad extrapolation is powerful evidence that the network learned an underlying linear manifold structure. It basically built a spatial map of time inside its own state space. How do they confirm that? How do you see a geometric structure inside a neural network? With principal component analysis PCA, you use PCA to find the most important dimensions of the network's activity. And when they did that, the projections clearly shared that as the delay time increased, the network's internal state shifted linearly along a specific line, a manifold, in that state space. So the memory of a time duration was literally transformed into a physical location in the network's internal space. To remember the time, it just had to remember a location. Exactly. The recurrent dynamics embedded the temporal task into a low-dimensional geometry. And this gives us a really compelling testable hypothesis for how something like the motor cortex might encode timing signals. Okay, finally, let's talk about the most dramatic demo of all, the one that tackles the problem of scale. learning and replaying a high dimensional video sequence. The movie replay task. This one is just spectacular because of the sheer dimensionality mismatch. The target signal was a little video clip of a kitten. Each frame had over 22,000 individual pixels. That's 22,000 readout units that all needed to be controlled at the same time. 22,000 outputs. and the recurrent network itself, the engine driving it all. It consisted of only 800 units. So you're running a 22,000 pixel movie with a processor that only has 800 components. I mean, that's like trying to run a 4K movie on a pocket calculator. The mismatch is just staggering. It is. And yet, despite that massive gap, the training errors went down and down and converged near zero. The network learned to accurately encode and then replay the entire video autonomously. What does that tell us? It demonstrates that predictive alignment isn't just for tasks with a few outputs like the Lorenza tractor. It's fully capable of handling these high dimensional spadiotemporal patterns. It proves the network can use its rich internal chaos from knee dollars and its aligned plasticity from Dider to generate complexity that far exceeds its own number of neurons. That really changes how I think about motor control. If 800 neurons can control 22,000 variables, it means the brain must be leveraging this kind of structure so efficiently that the number of neurons you need for a task is way, way lower than the complexity of the task itself. That's the key implication right there. The rule is efficiently extracting the low-dimensional structure, the core action of the video, from all that high-dimensional data, and letting this small recurrent engine drive a vast output space. So let's bring it all together. Let's synthesize this. We have this new framework, predictive alignment. It handles chaos. It generalizes structure. It manages short-term memory. It replays high-dimensional video. And it does it all without the biologically implausible baggage of older methods like force. What does this mean for how we understand the brain? The really profound implication is that we now have a local online supervised learning rule that works. It suggests that biologically realistic circuits don't need these highly precise global error signals to learn. The brain doesn't have to compute the exact global mistake to get better. And this brings us right back to biology. Where in the brain might a mechanism like this actually exist? The motor cortex seems like a prime suspect. It really does. The framework suggests a way that an internal signal can act as its own teacher. So instead of your motor cortex getting some clean error signal saying your hand missed by three centimeters, the feedback from the network's own output, that QZ signal, gets sent back internally to each neuron. So the neuron is training itself to predict a filtered version of the entire system's own success. Exactly. It's a local, self-supervised process. And the researchers even speculated on the physical location where this could happen, right down to the level of a single neuron's stendrites. They did. They tied it to known differences in how different parts of a neuron's input tree can change. Yes, this was one of the coolest parts. Remember the two connection types, null or the plastic one and the static one. They propose that the plastic connections, null or dollar, could project onto the proximal basal dendrites. The parts of the neuron close to the cell body, which are known to be highly plastic. Right. And the static chaotic connections, diodollars, could project further out to the distal basal dendrites, which are known to have lower or more varied plasticity. So the neuron itself becomes the computing unit that is physically aligning the inputs from these two separate locations Realizing that dual cost function at a local level It's an incredibly elegant hypothesis and what's great is that it makes a testable prediction if this is true Neurobiologists should be able to measure specific correlations inside individual neurons while an animal is learning That's a clear quantifiable thing to look for in future experiences So the big conclusion here feels profound. The brain might not need to brutally stamp out its own internal noise, its own chaos, to get us to do coherent things like walk and talk. Instead, it seems to use this process of gentle predictive alignment. It just it steers those turbulent, rich internal dynamics towards structured learning. behaviors. It really raises a question for you, the listener, to think about. I mean, given how common noisy, chaotic processes are in all of biology, how many other fundamental tasks, from a split-second decision to how you slowly adapt your tennis swing, are secretly being guided by this same process. This idea of internal prediction aligning with inherent complexity. If chaos is the raw material for rich thought, then maybe the secret to intelligence isn't about eliminating the turbulence at all. It's just about learning how to harness it. That's something to mull over the next time you do something complex without even thinking about it. Indeed. We look forward to diving into the next stack of sources with you soon. See you on the next Deep Dive.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.