Heliox: Where Evidence Meets Empathy πŸ‡¨πŸ‡¦β€¬

πŸͺ° Why Houseflies See Faster Than AI

β€’ by SC Zoomers β€’ Season 7 β€’ Episode 4

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 49:46

Send us Fan Mail

πŸ“– Read: https://helioxpodcast.substack.com/archive?sort=new

Through self-motion, flies efficiently translate image motion into temporally-precise, predictive high-speed vision.

The humble, infuriating, uncatchable housefly is the most honest teacher we currently have.

There is a moment, probably familiar to all of us, when the hand comes down, and the fly is already gone. Your palm stings against the kitchen table. The fly hovers, unbothered, on the opposite wall. You feel foolish. You should not. You have just lost a contest with one of the most sophisticated sensory systems on the planet β€” a system that evolution spent four hundred million years refining and that we, in our brief decades of artificial intelligence research, are only beginning to comprehend.

Synaptic high-frequency jumping synchronizes vision to high-speed behaviour
and 18 other references


This is Heliox: Where Evidence Meets Empathy

Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter.  Breathe Easy, we go deep and lightly surface the big ideas.

Support the show

Disclosure: This podcast uses AI-generated synthetic voices for a material portion of the audio content, in line with Apple Podcasts guidelines. 

We make rigorous science accessible, accurate, and unforgettable.

Produced by Michelle Bruecker and Scott Bleackley, it features reviews of emerging research and ideas from leading thinkers, curated under our creative direction with AI assistance for voice, imagery, and composition. Systemic voices and illustrative images of people are representative tools, not depictions of specific individuals.

We dive deep into peer-reviewed research, pre-prints, and major scientific worksβ€”then bring them to life through the stories of the researchers themselves. Complex ideas become clear. Obscure discoveries become conversation starters. And you walk away understanding not just what scientists discovered, but why it matters and how they got there.

Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter.  Breathe Easy, we go deep and lightly surface the big ideas.

Spoken word, short and sweet, with rhythm and a catchy beat.
http://tinyurl.com/stonefolksongs



Have you ever tried to swat a housefly and just completely failed? Oh, I mean constantly. Yeah. It is a remarkably humbling experience, getting outmaneuvered by an organism the size of a lentil. Right. You know the exact scenario I'm talking about. You're sitting there at the kitchen table, minding your own business, and this tiny buzzing speck lands near your coffee cup. Yeah. You slowly raise your hand. You calculate the trajectory. Wait for the absolute perfect moment. And then you strike with what feels like lightning speed. And you hit nothing but table. Nothing at all. Your palm is stinging. And the fly is, you know, already hovering on the other side of the room completely unharmed. Just mocking you. Exactly. Well, welcome to the Deep Dive. This is where we take your stack of sources, research papers, and articles and extract the hidden blueprints of how the world actually works. I'm your host, and today our mission is to figure out why you always miss that fly. Exactly. It's a great mystery. Because here's the thing that makes that failed SWAT so mind-bending to me. According to the classical rules of science and physics, you absolutely should have crushed that fly. You really should have. Based on everything textbooks have taught us about how eyes function, a fly moving at those erratic, lightning-fast speeds should be experiencing a massive, blinding motion blur. It should effectively be flying blind. Yeah, the math of classical optics paints a very clear picture of that SWAT. I mean, if their vision operates like a camera, the biological shutter speed simply cannot be fast enough to capture the image of your moving hand without smearing it into a useless gray gradient. Right. The sensory input should be a total mess. But clearly, they see your hand coming, they map its trajectory, calculate an escape route, and execute a physical evasive maneuver in a fraction of a second. And today, we are going to solve this paradox. We are diving deep into a groundbreaking new discovery published in a Nature Communications paper. Oh, this paper is fantastic. It's titled Synaptic High Frequency Jumping Synchronizes Vision to High Speed Behavior, authored by a team of researchers led by Mansoor, Tukalo, and Jusola. And this paper doesn't just, like, tweak our understanding of biology. No, not at all. It completely overturns decades of assumptions about how vision actually works at the cellular level. It really does. And we are pulling from a massive, fascinating stack of sources to contextualize this today. Yeah, we've got a lot to cover. We do. We're looking at this brand new biological breakthrough about the fly's eye, but we are holding it up against the history of artificial neural networks. We're looking at classic neuroscience, from the 1950s, the famous Hubel and Weasel experiments. Oh, those are wild. Yeah. We're bringing in DeepMind's research on meta-learning. And we are even exploring IBM's latest insights into the hardware of neuromorsic computing. I know that sounds like a massive wild leap for the listener. Connecting a housefly, dodging a rolled up magazine to the multi-billion dollar cutting edge of artificial intelligence. But there is a very specific reason we are taking this. Absolutely. The journey of Jusula's team spending years painstakingly studying the microscopic brain of a fly. It accidentally reveals the fundamental hard physical limits of modern AI. It exposes exactly why our current machine learning models hit a computational wall when they try to deal with the real chaotic, unpredictable physical world. The fly offers a radical, completely new biological blueprint for the future of machines. Okay, so to understand why the fly's vision is such a paradox in the first place and why the AI folks are paying so much attention, we have to go back in time. We have to look at how the scientific establishment thought vision worked. Right. And that story doesn't start with insects. It starts with cats. Specifically, it starts in the late 1950s and early 1960s with two neuroscientists, David Hubel and Torsten Weisel. Their work is the bedrock of modern visual neuroscience. I mean, they eventually won a Nobel Prize for it. Yeah, they were at Johns Hopkins and later Harvard, right. Exactly. They were studying the visual cortex of cats. Yeah. Their goal was to map the exact electrical pathway of how a brain processes the light that hits the retina. I was reading up on the methodology of those original experiments, and it's fascinating, but also, you know, a bit intense. It was a different era of science. Yeah. They would anesthetize the cats, immobilize their heads so their eyes couldn't move at all, and then project very simple static images onto a screen in front of them. We're talking basic geometry. Right dots, black lines, sharp edges. Meanwhile, they had inserted microscopically thin electrodes directly into the individual neurons of the cat's visual cortex. to listen for the electrical popping sound of a neuron firing. And the breakthrough actually happened by accident. They were struggling to get the neurons to fire by projecting little dots of light. As the story goes, they were slipping a glass slide into the projector, and the sharp shadow of the edge of the glass slide swept across the cat's visual field. Oh, wow. Suddenly the electrode went crazy. The neurons started firing rapidly. They realized the brain wasn't looking for docks of light, it was looking for edges. And from that happy accident, they discovered two basic types of visual cells. First, they found what they categorized as simple cells. These are neurons that are incredibly picky, Like a specific simple cell will only fire when the cat is shown a straight edge or a line at a very specific orientation and only in a highly specific tiny location on the screen. It's extremely localized. Right. If the line is perfectly vertical, the neuron fires. If you tilt that line even a fraction of a degree or move it an inch to the left, the neuron goes completely silent. It's a rigid lock and key mechanism. Building on that, they discovered a second layer of neurons they called complex cells. A complex cell also responds to a specific orientation, say a vertical edge, but it takes input from a whole cluster of those simple cells. Okay, so it's aggregating them. Exactly. Because it pools the data, it doesn't care where that vertical edge is within a broader receptive field. You can slide the vertical line left or right across the screen, and the complex cell keeps firing. It abstracts the concept of vertical line away from its exact spatial coordinates. So the simple cells are doing the hyperspecific, localized, pixel-by-pixel detail work. And the complex cells zoom out, pool that data, and say, yep, regardless of the exact coordinates, there is definitely a vertical line somewhere in this. general area that's it yeah Hubel and Weissel proposed this cascading hierarchical model the idea the brain builds an image layer by layer taking static localized features and pooling them into larger and larger abstract patterns and the specific cascading model is where the history of biology crashes head-on into the history of artificial intelligence, right? Oh, completely. Computer scientists were watching this space with intense interest. In 1980, a researcher named Kunihiko Fukushima looked at Hubel and Weisel's cat data and decided to build it in silicon. He created a computer model called the neocognitron. Yes. He literally programmed artificial S layers, mimicking the simple cells and C layers, mimicking the complex cells. He effectively tried to digitize the architecture of the cat's visual cortex. The S layers acted as rigid feature detectors, and the C layers pulled those features together to recognize a shape, even if it shifted slightly in the frame. And the neocognitron was the direct, undeniable ancestor to the convolutional neural networks, or CNNs, that Jan LeCun pioneered in the late 1980s and 90s. The zip code guy. Exactly. LeCun took this biological inspiration and used it to train computers to read handwritten zip codes on envelopes for the postal service. He built a system that could look at a sloppy handwritten number eight, break it down into its simple curves and edges, and pull them together to recognize the digit regardless of where it was positioned on the envelope. Which is the foundational architecture of almost all modern computer vision today. facial recognition on your phone, self-driving cars trying to spot a stop sign, AI image generators, they all trace their evolutionary lineage back to Jan LeCun's zip codes, Fukushima's neocognitron, and ultimately Hubel and Weisel's immobilized cats. An unbroken chain. But hold on, this is where I have to throw a flag on the play. You are jumping from a heavily sedated cat staring at a completely stationary dot on a wall. Right. To Yann LeCun training algorithms on flat, static, scanned photographs of zip codes. To a housefly pulling 5Gs of acceleration in my kitchen while doing midair acrobatics. It's quite a leap. The math on that simply does not scale. How does a model based entirely on stationary subjects looking at still images apply to a hyper-agile insect? That is the million dollar question that haunted biology for half a century. Yeah. The uncomfortable truth is that scientists just assume the basic principles scaled down. They extrapolated a static model onto a highly dynamic organism, and it created a massive, decades-long dead end in the research. Both neuroscience and AI built their foundational theories on a static model of vision. Let's break down exactly what that static assumption meant for the fly. A fly's compound eye is a marvel to look at under a microscope. It's made of hundreds of little hexagonal lenses called omatidia, arranged like a dome. It looks like little honeycombs. Yeah. Beneath each one of those tiny lenses is a cluster of photoreceptors, the actual biological cells that catch the incoming light. For decades, the dominant textbook theory was called the classic neural superposition model. model. And the core tenet of this model was that the primary photoreceptors, which scientists label R1 through R6, were essentially immobile, identical, fixed pixels. So the assumption was that the biological hardware was identical to a digital camera A rigid, unmoving grid of light-sensitive buckets just waiting for photons to fall into them. Exactly. The receptive fields of these cells were mapped as static coordinates. But flies don't fly in straight, smooth lines. They fly in these incredibly violent, fast, jerky movements. sensor. They dart. They zigzag. In biology, these rapid shifts in gates are called saccades. And if your eye is a rigid grid of pixels and you violently whip your head around, the physical math dictates a brutal consequence. Motion blur. Catastrophic motion blur. If you sweep a static grid of pixels across a highly textured environment at high speed, the photons bouncing off a single object, say the edge of your hand smear, across multiple different pixels in a fraction of a millisecond. So the image just gets completely washed out. The signal degrades. Spatially, the image loses all its sharp edges. So according to the established peer-reviewed science of the 20th century, the fly should be momentarily blind every single time it darts to the side. Which brings us right back to the swatting problem. Right. If it's blind while it's moving, how does it see my hand coming while it's already in flight? The theoretical model had to be fundamentally broken. The realization that the emperor had no clothes is exactly what kicked off the journey for Yusla and his team at the University of Sheffield. They looked at this paradox, the mathematical prediction of blind fly versus the physical reality of an uncatchable fly, and realized the foundational assumptions of their field were wrong. The static model could not explain reality. No. So if you are a researcher, where do you even begin to dismantle a theory that is printed in every textbook? You have to look at how the data writing those textbooks was gathered. Usela's team started by auditing the historical methodology of the field. How were previous scientists actually testing fly vision in the laboratory? And it turns out, for decades, researchers were stimulating fly eyes using something called Gaussian white noise. I had to look this up. Gaussian white noise is basically the visual equivalent of the static on an old, untuned analog television set. But the Sheffield team pointed out a glaring, almost embarrassingly obvious issue. A fly darting through a garden on a sunny afternoon does not experience smooth mathematical white noise. Right. Imagine you are a housefly zooming through a rosebush. You are moving from the blinding, direct glare of the afternoon sun into the deep, pitch-black shadow of a leaf and back out into the glare in a fraction of a millisecond. It's not a gentle gradient of gray static. It is a violent, extreme, high-contrast bombardment of photons. Testing a fly's eye with gentle white noise is like testing a Formula One race car suspension by driving it at 5 miles an hour in a perfectly flat grocery store parking lot. You're not going to learn anything about what the car can actually do. Exactly. The machine isn't being pushed. You aren't seeing what the biological hardware was actually engineered by evolution to do. So the researchers realized they had to build a completely new test track. They had to invent a fundamentally new way to observe the eye, and they had to do it across multiple microscopic scales simultaneously. And the methodology detailed in this nature paper is an absolute masterclass in experimental design. Yeah, first they needed a perfect map of the physical hardware. To get that, they used synchrotron X-ray images. Which is not your standard hospital X-ray. A synchrotron accelerates electrons to nearly the speed of light to generate incredibly brilliant focused beams of X-rays. Wow. They use this to take a non-destructive, hyper-detailed three-dimensional scan of the static optical layout of the fly's eye. They mapped the exact geometry of the lenses and the underlying cell structures. But they didn't stop there. They went deeper. They used electron microscopy to literally zoom in and count the individual photon sampling units inside the eye. Right. Inside those R1-R6 photoreceptor cells are tiny finger-like structures called microvilli. And these are the actual biological tubes packed with rhodopsin, the light-sensitive protein that catches the eye. the photon? You got it. And just by counting these structures, they found something deeply revealing about how evolutionary pressure shapes hardware. They compared two different species. Okay, what were they? They looked at the Drosophila fruit fly, the sluggish little guys that hover around your overripe bananas. Yeah, those are easy to swap. Very easy. A Drosophila photoreceptor has about 30,000 of these microvilli. Then they looked at the Musca housefly, the hyper-agile acrobat that dodges your SWAT. The housefly has roughly 54,000 microvilli per photoreceptor. The agilefly has nearly double the photon-catching hardware squeezed into the same cellular space. It's the biological equivalent of upgrading the megapixel count on a camera sensor specifically because you know you're going to be shooting high-speed dynamic sports photography. Yeah. You need more buckets to catch the light faster. That makes total sense. But the static anatomy was revealing, but it wasn't the breakthrough, was it? No. The real magic happened when they designed a rig to capture the dynamics of the eye in a living, breathing, fully intact fly. They are taking a living fly, they are inserting a microscopic glass wire into a single specific cell deep inside its head to measure the electrical current. It's insanely precise. At the exact same time, they are aiming a high-speed infrared camera through the lens of the eye to record microscopic cellular movement. And while recording all of this, they are projecting a high-speed, high-contrast, erratic action movie directly into the fly's field of vision to simulate the violent chaos of actual flight. It is an astonishing level of experimental precision. They managed to create a setup where they could observe exactly what the biological hardware did physically and exactly what it reported electrically. At the precise millisecond, a high-speed visual event occurred. So they finally have the rig, they secure the fly, they turn on the high-contrast action movie, and what do they see? What they captured on that infrared camera completely shattered the 50-year-old classical static pixel assumption. When the high-contrast light hit the photoreceptors, The cells did not just passively sit there and absorb the photons. What did they do? They physically moved. Wait, the actual physical cells twitched? Yes. They perform what the researchers categorized as ultra-fast photomechanical microsecades inside the eye. The very act of absorbing a photon causes a mechanical contraction in the cell. That is wild. The biological pixels violently jitter and dart around beneath the lens. Okay, I want to make sure I understand the mechanism here because this is crazy. It's a photomechanical response. Yeah. A photon of light comes through the lens, it travels down into that microvilli tube we talked about. It hits a molecule of rhodopsin. That impact triggers an enzymatic chemical cascade inside the cell, which alters the calcium concentration, which physically pulls on the actin cytoskeleton of the cell itself. So... Like the light literally forces the cell to flex like a muscle. That is the exact mechanism. The photon's energy is converted into a rapid mechanical force and they don't just twitch randomly. The authors coined a new term for this phenomenon, morphodynamic neural superposition. Morphodynamic. Right. The movement of the photoreceptors alters their morphology, their shape and position. As they twitch, they dynamically shift and actually narrow their receptive fields. Okay, but my brain immediately trips over the physics of this. If I am running with a video camera, the footage is going to be shaky and blurred. Sure. If I start violently shaking the camera back and forth in my hands while I'm running, wouldn't that physical jitter make the motion blur infinitely worse? How does jittering fix the problem of a smearing image? It feels entirely counterintuitive until you look at the geometry of what they're doing. It's about translating spatial blur into temporal sharpness. Think of it as the slit effect. The slit effect. Imagine you're standing inside a dark room, looking out through a large, wide window at a busy highway. A race car zooms by at 200 miles an hour. Because the window is wide, your eye tries to track it, but the image smears across your retina. It's just a blurry streak of color. Right. I can picture that. Now imagine you pull the blinds down so there is only a tiny vertical half-inch slit of light coming through the glass. The car drives by again. For the fraction of a millisecond that the car passes behind that tiny slit, the image of the car is crystal clear. Because there's no room for it to smear. Exactly. There's no room for spatial smearing. If you could somehow mechanically move that slit back and forth incredibly fast, tracking opposite to the direction of the car, you could capture a rapid series of perfectly sharp, unblurred slices of the car. Ugh. By physically moving and narrowing the receptive fields, the fly's photoreceptors are creating a dynamic slit. They are actively fighting motion bore with opposing motion. They were taking a spatial smear, the blur dragging across the eye, and chopping it up into highly structured, high-resolution temporal slices. To a normal camera, that is a useless blur. How well could the fly track them? Using this morphodynamic jittering eye, the fly's neural signals proved it could clearly resolve the two separate dots even when they were separated by just 0.7 degrees of visual angle. Let's pause and unpack that number, because 0.7 degrees of resolution isn't just good eyesight. It breaks the laws of classical physics, doesn't it? It violates the diffraction limit. It absolutely does. Light travels as a wave. When a wave of light passes through a circular aperture, like the tiny lens of a fly's omatidium, it doesn't land perfectly as a single, infinitely small point. It spreads out. Right. The wave spreads out, diffracts, and creates a blurry circle called an airy disk. And the Rayleigh criterion in physics dictates that if two points of light are too close together, their airy disks overlap so much that no sensor in the universe can tell them apart. They merge into one blob. Precisely. and the average omatidial lens on a housefly has a hard mathematical diffraction limit of about 1.1 degrees. A static sensor sitting behind that lens could never, ever resolve two objects that are only 0.7 degrees apart. A static sensor couldn't. But the fly's sensor isn't static. No, it's not. Because the photoreceptor is physically sweeping back and forth, it captures the peak of the lightweight from the first dot, and then moves to capture the peak of the lightweight from the second dot, before the overlapping blurry edges of those waves have time to integrate and confuse. use the signal. The temporal slicing beats the spatial blur. The fly achieves what scientists call hyperacuity. It literally beats the physical limitations of its own optical hardware through mechanical movement. The hardware is not a static camera. It is a dynamic dancing active sensor. It's beautiful, aren't you? honestly. It really is. But as incredible as the photo mechanical jitter is, it only solves half the problem. True. The eye is capturing incredibly sharp, fast slices of data, but the fly still has to process that data. The eye has to send those signals to the brain. And biological brains are wet, squishy chemical machines. Chemical transmission takes time. It does. If the eye is taking pictures at 1,000 frames a second, but the brain takes 50 milliseconds to process the chemistry, the fly still gets swatted. How does the fly's brain keep up with its own eye? This is the second massive discovery in the paper, and it brings us to the actual title of the research, Synaptic High Frequency Jumping. The researchers looked at the bottleneck. Okay. The R1, R6 photoreceptors catch the light, they jitter, and now they need to pass this information deeper into the brain to a specific layer of neurons called large monopolar cells, or LMC's. Okay. And they communicate across the synaptic cleft using a neurotransmitter. Specifically, the fly uses histamine. Yes. The photoreceptor releases histamine into the tiny gap between the cells, and the LMC detects it. Historically, neuroscientists viewed this chemical release as a relatively slow, smooth analog process. a steady trickle of information. Let's explain the mechanism of a biological synapse, because it's vital to understanding how radical this discovery is. A cell doesn't just spray loose histamine like a garden hose. No, it doesn't. It packages the chemical into tiny microscopic bubbles called vesicles. This is known as quantal release. Each bubble is a quantum, a specific packet of data. The bubble travels to the edge of the cell membrane, merges with it, pops open, and dumps its payload of histamine into the gap. And then the receiving cell has receptors that catch those molecules and convert them back into an electrical voltage. Right. So for a long time, the assumption was that the speed at which these vesicles could be mobilized, moved to the membrane, and released had a hard biological speed limit. But the Sheffield team discovered a completely hidden gear in the fly's synaptic machinery. When the fly is in a resting state looking at a slow, static scene, the synapse fires at a normal baseline frequency. The vesicles trickle out. But when the fly takes off, When it enters that chaotic, high-contrast, high-speed environment. The synapses dynamically shift their entire transmission into much higher frequency bands. They jump. They jump. Instead of a steady trickle of individual vesicles, the synapse mobilizes massive pools of vesicles and releases them in tightly synchronized, incredibly fast microbursts, perfectly timed to the high-contrast flashes of light the eye is experiencing. It's less like a garden hose and more like a dam operator who realizes a flash flood is coming. Instead of opening the spillway a crack and letting the water flow out smoothly, they start violently slamming the floodgates open and closed, sending rapid-fire, high-pressure Morse code waves down the river. That is a perfect analogy. The receiving cells deeper in the brain were processing data at a staggering 4,100 bits per second. Wait, how does the receiving cell have a higher bit rate than the cell sending the data? How does it gain information? Because the architecture is convergent. Six different photoreceptors R1 through R6 all wire into a single LMC. It is pooling the data from six different jittering angles. Ah. But because the synapses are jumping at these high frequencies, they aren't blurring the signals together, they are multiplying the information density. The operational bandwidth of this entire transmission extends up to about a thousand hertz. A thousand hertz, meaning the fly's brain is distinguishing a thousand distinct sensory events every single second. Exactly. To put into perspective how insane that is, classical textbook science believed the absolute ceiling for fly vision was something called the flicker fusion limit. Right, the old speed limit. Yeah. If a light flickered faster than 230 times a second, 230 hertz scientists confidently stated that the fly's brain couldn't process the gaps. It would just see a solid, continuous beam of light. The discovery of synaptic high-frequency jumping quadruples the known biological speed limit of the fly's brain. It completely rewrites the textbook on neural efficiency. Let's connect the chemistry back to the macro scale. Let's connect this back to me sitting at the kitchen table swinging my hand at this fly. Right. Because the synapses are utilizing this high-frequency synchronized burst mechanism, the inherent delays of chemical transmission, the time it takes for the vesicle to pop and the histamine to cross the gap are effectively neutralized. The LMC cell detects the sudden violent influx of histamine and begins generating its own electrical response almost instantly. In fact, the electrical voltage in the receiving LMC cell reaches its absolute peak response in just 7.6 milliseconds. 7.6 milliseconds from the moment the light hit the eye. And here's the fact that genuinely sounds like science fiction. The receiving LMC cell peaks at 7.6 milliseconds. But the primary photoreceptor, the cell that actually caught the light and sent the signal, doesn't hit its own maximum peak until 11.6 milliseconds. The receiving cell finishes reacting before the sending cell is finished sending. Yes. The receiver knows the answer before the sender finishes asking the question. It is entirely predictive. The synapse is shifting to such a high frequency and tracking the rate of change in the light contrast so aggressively that the LMC doesn't wait for the full signal to arrive. It extrapolates the trajectory. Exactly. It feels the sudden, massive spike of histamine, calculates the velocity of the changing contrast, predicts exactly where the signal is going to peak, and fires the alarm early. Predictive coding at the cellular level. It's not waiting to see the full rendering image of my hand approaching. It registers a high-velocity shadow sweeping across its jittering photoreceptors, calculates the math of the acceleration instantly, and triggers the behavioral response. Which is why the fly can initiate a physical evasive maneuver, altering the pitch of its wings to dodge your SWAT in a mere 13 to 20 milliseconds. That is unbelievably fast. Before the sensory cells at the surface of its eye have even finished fully processing the visual image of your hand, The fly's motor cortex has already received the predictive signal and initiated the escape. So summarizing the biological side of this deep dive, the fly's brain isn't just passively recording the world like a security camera bolted to a wall. Not at all. It is a profoundly active participant in its own perceptions. It is physically vibrating its own sensory hardware to slice through spatial motion blur, and it is shifting its chemical synapses into high-frequency overdrive to predict the future and process data at 4,000 bits per second. It is a dynamic, embodied system. Which brings us to the pivot point of today's deep dive. Because if a biological brain the size of a pinhead can seamlessly process a chaotic, high-speed, wildly unpredictable physical world without lagging, without motion blur, and with predictive hyperacuity, why does our most advanced, multi-billion dollar artificial intelligence struggle so heavily to do the exact same thing? It's a great question. Why does an autonomous drone get confused by a shadow when a fly navigates a dense forest at 60 miles an hour effortlessly? The uncomfortable answer, which we find when we cross-reference the biology with the history of artificial neural networks, is that AI is still fundamentally stuck in the 1950s. Still back with the cats. Yeah. The core architecture of machine learning is still mimicking Hebel and Weisel's immobilized cats staring at static lines on a wall. I was looking at the structure of modern AI, particularly computer vision models. Almost all of them are built on convolutional neural networks, or CNNs. No, they are processing reality as a flipbook, not a continuous dynamic physical flow. And because they do this, they suffer from massive computational bloat. A standard CNN requires full connectivity between its layers. If you feed it a high-resolution image, you're talking about hundreds of millions, sometimes billions, of numerical weights parameters that need to be held in memory, calculated, and updated for every single artificial neuron. It is a brute force mathematical approach to vision. You feed a static image of a dog into the network. The network runs the data forward through its layers of weights and makes a guess. Let's say it guesses it's a cat. The system then calculates the loss, the exact mathematical difference between its wrong guess and the right answer. Okay. It then pauses the entire network and propagates that error signal backward through every single layer, using complex calculus and derivatives to adjust every single one of those millions of weights just a tiny bit so that next time it is slightly more likely to guess dog. It is mathematically brilliant, but as neuroscientists point out, it creates a massive vital plausibility gap. Backpropagation is biologically impossible. Real, wet, physical brains simply do not work that way. What specifically makes it impossible for the fly's brain to do backpropagation? Well, there are a few hard physical barriers. First, backpropagation requires a global error signal. Every single artificial neuron in the network instantly knows exactly how wrong the final output was. Biology doesn't have a global PA system like that. Neurons only know what their immediate neighbors tell them. Second, and more importantly, is what scientists call the weight transport problem. I read about the weight transport problem. In a digital computer, data can flow both ways easily. The algorithm calculating the backward error pass can instantly read the exact numerical value of the forward-facing connection. It knows exactly how strong the connection is and adjusts it. But in a wet biological brain, synapses are one-way streets. The histamine only flows from the photoreceptor to the LMC. The LMC cannot send a signal backward up the exact same channel. Ah, okay. Furthermore, the weight of a biological Synex isn't a number in a spreadsheet. It's a physical structural reality. It's the surface area of the membrane, the number of vesicles, the density of receptors. A neuron cannot look backward and perfectly read the structural weight of the synapse that just fired at it. It is like trying to send a return letter through the mail to someone whose address scrambles itself every time they send a letter. A real brain does not freeze time, calculate its global errors, and run calculus backward through its own architecture. It learns continuously in a forward direction through physical adaptation. Exactly. Now, a listener might argue, you know, does that actually matter? If the AI gets the right answer eventually using backpropagation, who cares if it learns differently than a fly? Results are results. It matters intensely the moment we take these systems out of the pristine controlled laboratory and put them into the messy, unpredictable real world. Let's pull in the 2025 paper by Ben Lee on the artificial visual system. Oh, this paper is great. Lee's team tried to build a stereo orientation model using those classic Hubel and Weisel principles. And Lee makes a crucial damning observation about deep learning models. He moots that if you give a deep learning model massive amounts of perfectly clean, labeled, static training data, it develops remarkable capabilities. It memorizes the patterns flawlessly. But the moment you introduce noise or dynamic clutter, they fail spectacularly at robustness. Robustness is the Achilles heel of modern AI. Deep models are incredibly easily distracted by potential spatial features that are actually just background clutter, shadows, or sensor noise. Because they don't have the active physical filtering mechanisms of a biological brain. Right. It's like the fly's dynamic receptive fields that physically jitter to slice through the noise. The AI has incredibly weak attention mechanisms. The fly uses its physical interaction with the world to dynamically filter the chaos in real time. The AI just passively absorbs the entire static frame noise, shadows, and all, and gets hopelessly confused. It's overwhelmed. It's like the AI is a student studying perfectly printed, highly curated flashcards in a completely silent, brightly lit library. It becomes an absolute genius at recognizing the flashcards. But the fly's brain isn't in a library. It is playing a chaotic, high-speed, virtual reality video game where the environment is constantly shifting, the lighting is strobing, and the physical game controller is literally changing shape in its hands as it plays. Yeah. The flashcards versus the shape-shifting controller is the exact dichotomy. The fly uses the chaos, the mechanical jitter, the rapid saccades, the physical movement to actually improve its signal. The physical movement creates the hyperacuity. While the AI treats chaos and movement as an error that must be minimized by looking at more static flashcards. So how do we bridge this massive gap? If we want to build artificial intelligence that can navigate a drone through a dense forest canopy at 60 miles an hour without crashing a task, a housefly manages with a brain the size of a grain of salt and a fraction of a milliwatt of power, we clearly can't just keep adding more layers to a static CNN and feeding it billions of new flashcards. No, the brute force approach is hitting a wall. So what's next? We have to abandon the legacy architecture. And this is driving the explosion in a frontier field known as neuromorphic computing. And our source from IBM dives incredibly deeply into the mechanics of this shift. Neuromorphic computing abandons the classic von Neumann computer architecture, the strict, rigid separation of the memory unit and the processing unit, and tries to build silicon hardware that physically mimics the wet biology of neurons and synapses. We're talking about building computer chips where the hardware is the algorithm. Tell me about the learning side of this. If neuromorphic chips aren't using backpropagation in calculus, how do they learn? They're exploring biological principles like dendritic learning and short-term Hebbian plasticity. Pantamplasticity. Yeah, the core rule of Hebbian plasticity dating back to psychologist Donald Hebb in 1949 is simple. Neurons that fire together wire together. If two cells are constantly activated by the same stimulus, the physical connection between them strengthens. Okay, that makes sense. In modern neuromorphic models, researchers are focusing heavily on the dendrites, the sprawling branch-like structures of neurons that receive incoming signals. I was reading about these new models mentioned in the research, specifically the clestron and the G-clestron models, and this is where the AI research perfectly mirrors the fly biology. Oh, golly. In a traditional artificial neural network, learning simply means changing a numerical weight in a matrix. But in the clestron model, the artificial synapses actually physically reorganize themselves. It is structural plasticity applied to silicon. If two artificial synapses on a neuromorphic dendrite are firing together because they detect a correlated feature in the environment, they will mathematically move along the dendritic branch to physically cluster closer to each other. it needs to mimic this physical spatial reorganization. You cannot separate the physical architecture from the learning algorithm in biology. And we are realizing we cannot separate them in advanced AI either. No. And we actually see this temporal dynamic principle scaling up even to the most famous advanced AI models we have today, transformers. Transformers. The architecture behind chat GPT, Claude, and all the modern large language models. How does a jittering fly eye or a clustered dendrite connect to the architecture of chat GPT? Well, I have a fascinating source paper here on the neuroscience of transformers. Researchers are looking at the human cerebral cortex and they're realizing that the brain actually mimics the computational logic of modern AI transformers. Wait, really? But it does so temporally through rhythmic physical oscillations rather than just spatially. Break that down. What does a transformer actually do and how does the brain mimic it? At its core, a transformer relies on two main components. an encoder, which takes in the massive stream of data and maps the relationships between the words of pixels, and a decoder, which takes that map and predicts the next piece of data. Okay. In the human brain, researchers theorize that the superficial layers of the cortex layer 2 and layer 3 act as the encoders. And here is the temporal part. They run on very fast high-frequency brainwaves called gamma bands. Fast high-frequency. Just like the fly's synapses jumping to high frequencies to encode the fast saccadic bursts of light. The exact same biological principle of temporal segregation. Meanwhile, the deeper layers of the human cortex, layer 5, act as the decoders, making the predictions. And they operate on slower, entirely different rhythmic frequencies, alpha and beta bands. The biological brain is using physical location, different depths of cortical tissue and temporal frequencies, fast versus slow oscillating brainwaves to segregate the encoder and decoder. It prevents the signals from interfering with each other without needing billions of discrete isolated digital circuits. Precisely. So the biological brain is acting like a transformer, but it is a physical, oscillating, vibrating machine, not just a static block of code running on a server. And I assume this physical dynamic architecture allows for a completely different, more adaptable kind of learning. A profoundly more flexible kind of learning. This brings in our DeepMind source detailing the famous Harlow experiment. I love this study. The monkey experiment from the 1940s. Harry Harlow was a psychologist who tested macaque monkeys with a very simple setup. He placed two random objects in front of them, like a red block and a blue sphere. Only one of the objects had a piece of food hidden under it. And the placement was random, right? The left to right placement was totally random, so the monkey couldn't just memorize "always pick left." Over hundreds of trials with different objects, the monkeys didn't just blindly memorize which specific object was correct. They developed a dynamic strategy. They internalized the abstract rule of the game. They would pick an object randomly the first time, observe the result, and instantly apply the rule to the next choice. It adapted dynamically to the flow of information. It learned how to adapt to a constantly changing reality rather than just mapping the coordinates of a static reality. Which brings us all the way back to our house fly. The fly is the ultimate embodied meta learner in its environment. Oh, absolutely. It uses its physical movement, the photo mechanical jitter of the eye, the violent saccades of its flight to constantly ping the environment, generate sensory feedback and predict the exact next millisecond of reality. The fly uses physical action to inform its perception, and it uses perception to immediately drive action in a continuous, high-frequency, unbroken loop. It's a unified system. So if the science is pointing in this direction, if neuromorphic computing and these dynamic, physically embodied models are clearly the answer to AI's robustness problem, Why isn't all AI built this way? Why are we still pouring billions of dollars into building giant data centers to train static CNNs and massive transformers? Because as the IBM researchers explicitly warn in their paper, neuromorphic computing is incredibly excruciatingly difficult to build. What are the physical hurdles of building a brain on a chip? Well, first, the interdisciplinary learning curve is massive. To build a neuromorphic chip, you need neuroscientists, physicists, materials engineers, and computer scientists all speaking the exact same language. But the biggest hurdle is the hardware itself. The actual silicon. Yeah. When engineers try to take our highly accurate static deep neural networks and convert them to run on these dynamic spiking neuromorphic chips, there is often a massive unacceptable drop in accuracy. Why does the accuracy drop? Because the hardware is fundamentally noisy. Neuromorphic tips often use components like memristors, resistors that remember their past electrical state, to act as artificial synapses. But memristors suffer from cycle to cycle variation. What does that mean? What does that mean? If you run the exact same voltage through a member stir twice, you might get a slightly different output the second time. The physics of the materials are messy. The hardware is messy and unpredictable, just like actual biology. It is exactly like biology. Yeah. But we don't know how to program for messy hardware yet. The tools just don't match the theory yet. Precisely. We are in the awkward transitional phase. But despite the massive engineering hurdles IBM and others are facing, the fundamental biological research like Jusula's work on the housefly proves that the destination is absolutely worth the journey. Yeah. The fly is living proof that hydra-efficient, highly predictive, incredibly fast sensory processing is physically possible, and it can run on a fraction of a milliwatt of biological power. But to get to that destination, to achieve that level of efficiency, we need to fundamentally abandon the static flashcards. We need artificial intelligence that doesn't just calculate pixels in a vacuum. We need machines that structurally, temporally, and dynamically adapt to their environment in real time. It's a huge paradigm shift. It really is. Well, we have covered a massive amount of ground today, bridging biology, physics, and computer science. Let's try to summarize this incredible journey for the listener. We started with the simple, everyday annoyance of a swatted housefly. But by looking closer at that failed swat, we uncovered an absolute masterclass in evolutionary engineering. We really did. We learned that the classical textbook models of vision were fundamentally wrong flies do not have static pixel sensors that suffer from blinding motion blur. Instead, they utilize a brilliant hack called morphodynamic neural superposition. Right. They literally vibrate their photoreceptor cells using ultra-fast photomechanical microsecates. They actively create a dynamic slip, slicing through spatial motion blur to create hyperacuity, allowing them to see sharp details that are mathematically smaller than the physical diffraction limits of their own lenses. And to process that hyper-fast visual data without lagging behind reality, they use synaptic high-frequency jumping. When the flight gets chaotic and the contrast spikes, their synapses shift into overdrive. Which is incredible. They abandon the steady trickle of neurotransmitters and release massive synchronized pulses of histamine, achieving record-breaking processing speeds of over 4,000 bits per second. The speed allows the receiving cells in their brain to extrapolate the math and predict the future, initiating an evasive dodge in milliseconds before the sensory embeds. has even fully rendered. It is a deeply dynamic, active, embodied system that puts our most powerful static artificial intelligence to shame. It forces us to completely rethink what true intelligence actually looks like at the hardware level. It isn't just about massive calculation, it is about dynamic interaction. It really is. Which leaves me with a final lingering thought for you, the listener, to mull over as we end this deep dive. We have seen today that the fly's tiny brain relies entirely on physical mechanical movement we called morphodynamic jitter to process information accurately. Right. The fly literally uses the physical mass of its body's interaction with the physical world to chop up time, filter out noise, and understand space. It doesn't just passively observe the world from a distance, it physically wrestles with the chaos of the environment to make sense of it. It has to. So, if true biological intelligence, the kind of intelligence that effortlessly navigates a messy, chaotic, real-world environment, requires this physical, mechanical jitter and constant interaction with a changing reality... Will an artificial intelligence trapped in a static motionless server rack in a sterile data center ever achieve true real-world comprehension? Or does a mind actually need a physical body to truly see? It is a profound architectural question that the field of AI is going to have to answer very soon. It is definitely something to think about the next time a tiny housefly outsmarts you in your kitchen. Thank you for joining us on this journey today. Keep questioning the obvious, keep looking closer, and keep diving deep.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.