🧬 The Future Of Discover: What AlphaEvolve Tells Us About the Future of Human Knowledge Artwork

Heliox: Where Evidence Meets Empathy 🇨🇦‬

Join our hosts as they break down complex data into understandable insights, providing you with the knowledge to navigate our rapidly changing world. Tune in for a thoughtful, evidence-based discussion that bridges expert analysis with real-world implications, an SCZoomers Podcast

Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.

Curated, independent, moderated, timely, deep, gentle, evidenced-based, clinical & community information regarding COVID-19. Since 2017, it has focused on Covid since Feb 2020, with Multiple Stores per day, hence a sizeable searchable base of stories to date. More than 4000 stories on COVID-19 alone. Hundreds of stories on Climate Change.

Zoomers of the Sunshine Coast is a news organization with the advantages of deeply rooted connections within our local community, combined with a provincial, national and global following and exposure. In written form, audio, and video, we provide evidence-based and referenced stories interspersed with curated commentary, satire and humour. We reference where our stories come from and who wrote, published, and even inspired them. Using a social media platform means we have a much higher degree of interaction with our readers than conventional media and provides a significant amplification effect, positively. We expect the same courtesy of other media referencing our stories.

All Episodes

Heliox: Where Evidence Meets Empathy 🇨🇦‬

🧬 The Future Of Discover: What AlphaEvolve Tells Us About the Future of Human Knowledge

June 14, 2025 • by SC Zoomers • Season 4 • Episode 56

Send us a text

Go deeper with this episode substack. There is also a comic.

There's something deeply unsettling about watching a machine solve problems that have stumped humanity's brightest minds for over half a century. Not because it threatens our ego—though it certainly does that—but because it forces us to confront uncomfortable truths about the nature of knowledge, discovery, and what it means to be human in an age of artificial intelligence.

Google DeepMind's AlphaEvolve just broke a 56-year-old mathematical record. Not improved upon. Not incrementally advanced. Broke. The kind of breakthrough that makes you wonder what else we've been missing, what other solutions have been hiding in plain sight, waiting for the right kind of intelligence to find them.

AlphaEvolve: A coding agent for scientific and algorithmic discovery

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

This is Heliox: Where Evidence Meets Empathy

Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.

Thanks for listening today!

Four recurring narratives underlie every episode: boundary dissolution, adaptive complexity, embodied knowledge, and quantum-like uncertainty. These aren’t just philosophical musings but frameworks for understanding our modern world.

We hope you continue exploring our other podcasts, responding to the content, and checking out our related articles on the Heliox Podcast on Substack.

Support the show

About SCZoomers:

https://www.facebook.com/groups/1632045180447285
https://x.com/SCZoomers
https://mstdn.ca/@SCZoomers
https://bsky.app/profile/safety.bsky.app

Spoken word, short and sweet, with rhythm and a catchy beat.
http://tinyurl.com/stonefolksongs

Curated, independent, moderated, timely, deep, gentle, evidenced-based, clinical & community information regarding COVID-19. Since 2017, it has focused on Covid since Feb 2020, with Multiple Stores per day, hence a large searchable base of stories to date. More than 4000 stories on COVID-19 alone. Hundreds of stories on Climate Change.

Zoomers of the Sunshine Coast is a news organization with the advantages of deeply rooted connections within our local community, combined with a provincial, national and global following and exposure. In written form, audio, and video, we provide evidence-based and referenced stories interspersed with curated commentary, satire and humour. We reference where our stories come from and who wrote, published, and even inspired them. Using a social media platform means we have a much higher degree of interaction with our readers than conventional media and provides a significant amplification effect, positively. We expect the same courtesy of other media referencing our stories.

0:25

Welcome to the deep dive today we are cracking open really a fascinating piece of research from Google DeepMind. The paper is called AlphaVolve, a coding agent for scientific and algorithmic discovery. And we're going to take a deep dive into the system AlphaVolve. The core idea here is pretty wild, actually. It's an AI that can discover brand new algorithms. And not just that, but make actual scientific and practical breakthroughs by writing and evolving code. Just think about that for a second. An AI that finds new math and, you know, significantly improves these massive real world computer systems. So our mission for this deep dive, let's unpack exactly what AlphaVolve is. We'll lift the hood. See how it uses this blend of AI and evolutionary ideas to actually do this. And maybe the most exciting part, look at some really surprising breakthroughs. It's already notched up. We're talking breaking a 56-year-old record in mathematics, optimizing key parts of Google's own infrastructure. Get ready, because combining these ideas leads to some serious aha moments. OK, so let's dig into this AlphaVolv system. The paper calls it an evolutionary coding agent. Well, it sounds like a mashup of Compuix ideas. At its core, what's it actually doing? Right. What's really interesting here is that it's not just, you know, one big AI trying to solve a problem in a single go. Alpha Evolve is really more like an autonomous pipeline. It orchestrates large language models, LLMs. And its main job, fundamentally, is to take some existing code, maybe an algorithm or a function, and just iteratively improve it, making direct changes to the code itself. Oh, okay. You mentioned the key mechanism. It uses an evolutionary approach. So it generates variations of the code, right? Yeah. Then it evaluates how well they perform against a specific goal you set. And then it uses that feedback to guide the next round, generating hopefully even better code variations. It's this continuous cycle, propose, test, refine. So it's not just like asking an LLM, hey, write me the best sorting algorithm. It's more like, here's a sorting algorithm. Now keep tweaking it, keep testing it until it gets faster or more efficient. Exactly. Yeah, that's the crucial difference. It's grounded in actual code execution and automated evaluation. This isn't relying on the LLM just sort of knowing what might be a good change based on its training data. No, it proposes changes. It runs the actual code, marries the performance objectively, and gets that hard machine-grade feedback. That loop helps it explore ideas that maybe they're non-obvious to a human, maybe even counterintuitive. And critically, it catches errors or bad suggestions the LLM might make if it wasn't being rigorously tested. Okay, that makes sense. The testing is key. Absolutely. If you look at the diagrams in the paper, the flow starts with the human. The human defines the what? I'm going to go on. That's the problem you want to solve, right? The criteria for evaluating how good a solution is and maybe some initial code to start with. And here's a really critical requirement. You must be able to automate the evaluation. You need a function, some code that can automatically score any candidate program Alphavolv comes up with. Got it. So the human sets the stage. Sets the stage, defines the finish line. And once that's set up, Alphavolv takes over to figure out the how. finding the improved solution through this iterative process. And that pipeline you mentioned, what are the pieces? Yeah, so the pipeline itself has several key parts working together. There's a program database that stores all the attempts, their scores, learns from what worked and what didn't. Then you have prompt samplers that build these detailed instructions, prompts for the LLMs, pulling in context from the database, like past good ideas. Then there's an ensemble of LLMs. They generate the actual code changes. Ensemble, meaning more than one? Yeah, often using a mix, maybe one that's fast and generates lots of diverse ideas, another that's maybe a bit slower but generates higher quality suggestions. Okay. And finally, the evaluators. They run and score the new programs, the promising ones. They go back into the database to inform the next generation of code. That iterative loop, it sounds almost Darwinian, like survival of the fittest code. That's a great way to put it. But you know, with code, you could change almost anything. How does alpha-evolve know where it's allowed to experiment? Which parts of the code can it actually evolve? That's handled pretty cleverly, actually, through a specific structure, an API. The user explicitly marks sections of the code using special comment tags. Things like hashtag evolve block start and hashtag evolve block end. So only the code between those tags is fair game for evolution. The rest of the code acts like a stable framework, a skeleton, that Elsevolve won't touch. Oh, so you constrain the search. Makes sense. You point it at the function you think needs optimizing. Precisely. It lets the human expert guide the search space. And when the LLMs generate changes, they get context about the problem, about previous attempts. They're often asked to propose changes in a diff format. Like you see in software development, showing additions and deletions. Exactly that format. Search, replace. Very standard. Though sometimes it might propose a whole rewrite of the block. And as we touched on, they use that ensemble of LLMs, Gemini 2.0 Flash, maybe for throughput and diversity, Gemini 2.0 Pro, perhaps for higher quality ideas. It's about balancing exploration with exploitation. Right. Trying lots of things, but also digging deeper on promising paths. Yeah. And there's flexibility too. Depending on the problem, you might evolve the core algorithm code directly, or maybe you evolve a helper function whose job is to construct the final solution. Or, and this is quite meta, sometimes you even evolve a custom search algorithm that's specifically designed to find the solution to your main problem. Different approaches work better for different tasks. Wow. Okay. Evolving the search strategy itself. And the evaluation part, it's absolutely central. You, the user, provide a function, usually a Python, that takes a candidate program and spits out a numerical score or maybe multiple scores if you have several objectives. Like speed and memory usage. Exactly. It can optimize for multiple objectives simultaneously. Yep. And they use techniques like evaluation cascades. Cascades. Yeah. So maybe initial tests are cheap and quick, just filtering out the obvious failures. Only the programs that pass the easy tests move on to more rigorous, maybe slower, more expensive evaluations. Makes the whole thing much more efficient. Okay. So it sounds like setting up that initial problem, defining the goal, building that automated evaluator, that's where the human expertise is crucial. Right. But once it's running, this rigorous evolutionary search takes over. Has this process actually led to anything? Well, truly surprising. Or groundbreaking. This is where it gets really interesting, I think. What kind of new knowledge has this system actually discovered? Maybe start with pure math and algorithms? Absolutely. And this is where it goes from being just an interesting system design to showing real tangible impact. One of the most striking examples comes from a really fundamental area, matrix multiplication. This operation is just everywhere. Graphics, simulations, AI, deep learning. Finding faster ways to multiply matrices is a classic hard problem, especially finding efficient low rank decompositions of the related tensors, even for small matrices. It's tough. Right. It's one of those building blocks of computing. Exactly. And AlphaVolve made a pretty significant breakthrough here. For multiplying two 4x4 complex valued matrices, a specific but important case, it discovered an algorithm that needs only 48 scalar multiplications. Okay. 48. Is that good? It's incredibly good. Yeah. Because this is the first improvement in 56 years. over Strassen's seminal algorithm from 1969 for this specific configuration. 56 years. 56 years. It didn't just match the record. It beat a record that had stood for over half a century. And the paper notes, it also improved the state of the art for, I think, 14 other known matrix multiplication algorithm variants too. Wow, 56 years. That number is staggering. Wow. It really puts into perspective how tough these problems are, how optimized these things already were. What does that tell us about what AlphaVolve is capable of, something that, you know, traditional human efforts or even older computational searches couldn't find? Yeah, I think it highlights the power of being able to just systematically explore a truly vast space of possibilities and rigorously evaluate them. without sort of human intuition or biases limiting the search. I mean, human mathematicians, computer scientists, they've worked on this for decades, right? Using incredibly clever techniques. AlphaVolv's strength seems to come from the systematic automated exploration guided purely by the objective evaluation function. You can find these non-obvious combinations, maybe structures that a human might overlook or just dismiss as unlikely. It's like brute force exploration combined with intelligent generation and that crucial relentless testing. Exactly. And it wasn't just matrix multiplication. They applied alpha-volve to, I think the paper said, over 50 open mathematical problems. across different fields too. Combinatorics, geometry, number theory, analysis. Now, it didn't break records on all of them, obviously. It rediscovered the best-known solutions on many. But it did surpass the state-of-the-art on about 20% of those it tackled. 20% is still pretty impressive on open problems. Any other cool examples? Yeah, some specific ones really paint a picture. It improved the upper bound for the Erdos minimum overlap problem in combinatorics. In geometry, there's the kissing numbers problem. Wait, the kissing numbers problem? That's literally about how many spheres can touch another central sphere without overlapping. That's the one. Yeah. And in 11 dimensions, it improved the known lower bound from 592 to 593. it found a way to pack one more sphere in 11D space. That's amazing. Just finding one more sphere in 11 dimensions sounds ridiculously hard. It is. And it also improved bounds on packing other shapes, like packing hexagons into a larger hexagon, or packing circles into a square, finding configurations that let you pack more stuff in, basically. And also variants of the Heilbronn problem, arranging points to maximize the minimum triangle area they form. It improved bounds there, too. So it's tackling these really diverse geometric and combinatorial puzzles. And what's really interesting, again, about how it got some of these math breakthroughs, remember we talked about flexibility. Yeah, evolving different things. Right. So often it wasn't directly evolving the final mathematical construction itself. Instead, it evolved heuristic search algorithms. It found better strategies for a computer search process, and those improved search strategies were then able to discover the better constructions. So it found a better answer by finding a better way to search for the answer. That feels very meta and really powerful. It is. It's like learning how to learn, but for discovering mathematical objects. And the paper mentions collaborations. with mathematicians like Terence Tao. Yes, absolutely. Setting up these complex problems for AlphaVol, defining that search space, creating that critical automated evaluation function that takes deep domain expertise, and then interpreting the results, proving they're correct, that often needs human mathematical insight too. These breakthroughs were partnerships between the AI and human experts. That makes sense. It's a tool, a very powerful one, but still working with experts. Yeah. Yeah. Okay, so it's clear it can make these fundamental scientific advances, which is incredible. But what about the real world? You know, beyond pure science, the outline mentioned it's already making a difference where the rubber meets the road, specifically within Google's huge computing infrastructure. Tell us about that. Yeah, this is where the practical impact becomes very, very clear. One big area is data center scheduling. Think about Google's data centers, millions of computers running millions of jobs. Packing these jobs onto machines efficiently is this massive, complex problem. It's basically vector bin packing. Bin packing, like fitting items into boxes. But with computational jobs. Exactly. And if you don't schedule it well, you get what they call stranded resources. Machines that have, say, plenty of CPU free or lots of memory, but not the right combination to fit the next waiting job. So that capacity sits idle. Wasted resources. Wasted resources. So AlphaVolve was tasked with evolving heuristic functions for the scheduler that assigns jobs to machines. and it found a surprisingly simple interpretable heuristic just a rule based on ratios of resources needed versus resources available it wasn't some complex black box when they deployed this alpha-volved discovered heuristic across the entire fleet It recovered, on average, a remarkable 0.7% of Google's total fleet-wide compute resources that were previously stranded. 0.7% of Google's entire compute fleet. Can you put that into perspective? What does that actually mean? Is that like saving a bit of energy or? It's massive. Given the scale we're talking about, recovering 0.7% of fleet-wide compute is like adding tens of thousands of machines worth of capacity just through smarter software. It translates directly into huge cost savings, energy savings, and more capacity to run workloads without building new physical data centers. And in fact, it was simple, interpretable. That was a bonus. A huge bonus. Engineers can understand it, trust it, maintain it, which is much harder with a complex, opaque AI model sometimes. Okay, that's a massive real world win. What else? Another critical area, Gemlight Kernel Engineering. Training huge models like Gemini requires incredibly optimized low-level code, these kernels, that run efficiently on hardware accelerators, like Google's TPUs or GPUs. Optimizing these kernels, especially things like how data is arranged and moved tiling for operations like matrix multiplication, is super challenging work. Usually done by expert engineers over long periods. Right. Getting every last drop of performance out of the hardware. Exactly. So AlphaVolv was used to optimize these tiling heuristics specifically for Gemini's TPU kernels. The results. An average 23% speed up just on the specific kernels that optimize. Same 3%. Yeah. Which translated into a 1% reduction in the overall Gemini training time fleet wide. 1% off training Gemini. Again, at Google scale, that must be enormous in terms of saved compute time, energy, cost. Absolutely enormous. And the paper also mentions it slashed the time needed to find these optimizations from potentially months of engineering effort down to days of alpha-volve running. So faster improvements, too. It's like... Gemini through Alpha Evolve is literally helping to optimize its own training process. That's a fascinating feedback loop. It really is. And it went even lower level. Hardware circuit design for TPUs. This is optimizing the actual register transfer level code, the RTL, that describes the physical circuits. Stuff written in languages like Verilog. Whoa, designing the actual chip logic. While optimizing a piece of it. It optimized a key arithmetic circuit within the TPU's matrix multiplication unit. It found a relatively simple code change in the RTL that reduced the physical area the circuit takes up on the chip and its power consumption. So it made the hardware itself more efficient. Yes. And this wasn't just a theoretical finding. It was validated by the human TPU designers. Right. They checked it, confirmed the benefits, and it's planned for integration into future TPU generations. That's incredible. The paper highlights it as Gemini's first direct contribution via AlphaVolve to the actual TPU arithmetic circuits. Mind-blowing. Okay, any more? One more big one. Optimizing code that's already been heavily optimized by compilers. Compilers like XLA take high-level code and translate it into these super-optimized intermediate representations, IRs, tailored for specific hardware. You'd think there's not much juice left to squeeze there. Yeah, compilers are supposed to be good at this stuff. Right. But they applied AlphaVol to the XLA-generated IR for a flash attention kernel that's a key component in transformer models running on GPUs and still found significant improvements. a 32% speedup for the kernel itself, and a 15% speedup in the surrounding code block. Even after the compiler did its best pass. Even after the compiler. It shows this evolutionary approach confined optimizations that even sophisticated rule-based compiler heuristics might miss. It's genuinely incredible. The range is just stunning. staggering. From pure math proofs that stood for decades down to tweaking hardware logic and optimizing compiler output, it really does feel like a tool that extends human capability, pushing into complexity or search spaces that are just too vast or frankly too tedious for people to explore effectively alone. It's not replacing the experts like you said, but it's giving them this super tool. That seems to be exactly the takeaway. And the paper backs this up with ablation studies. They experimented by turning off different parts of alpha-volve. And they found that, yeah, you really needed the combination, the evolutionary search, the power of the modern LLMs, proposing diverse ideas, giving them the right context and the prompts, and being able to evolve reasonably sized chunks of code. It was the synergy of all those pieces that led to these results. It all had to work together. Exactly. Exactly. So zooming out, why should you, our listener, care about an AI improving matrix multiplication or finding slightly better ways to pack shapes? Well, as we've seen, these things that sound abstract or low level, they're absolutely foundational. faster algorithms, more efficient computation that underpins basically all the tech you use every single day. Right. The speed of your phone, how fast websites load, search results. Online services, the capability and speed of the AI models being developed right now. More efficient computing isn't just about saving money for big companies. It unlocks the ability to tackle more complex scientific challenges, to engineer more sophisticated systems, maybe solve problems that are currently just out of reach because they need too much compute power, or we just don't have algorithms efficient enough yet. So, putting it all together then, what does this Alpha Evolve system really represent? We've seen an AI system that can discover the very methods for solving problems by autonomously exploring and evolving code. It's achieved these provable mathematical breakthroughs, things that stumped experts for decades, and delivered really significant measurable performance gains in complex, real-world engineering systems at Google's massive scale. I think it powerfully demonstrates the potential you get when you combine the broad sort of creative capabilities of modern LLMs, their knack for generating code, proposing variations with the rigorous, objective, sustained search of an automated evolutionary framework. A framework guided by that relentless machine-grade feedback. The LLM suggests ideas, maybe things a human wouldn't think of. The system tests them rigorously, keeps the good ones, learns, and repeats. The LLM brainstorms. The evolutionary system validates and refines. That's a good way to put it. Now, the main limitation, we should mention it again, is that need for an automated evaluator. If you can't define a function, some code, that automatically scores how good a candidate program is for your specific problem, Well, then alpha isn't the right tool for that job. And building that evaluation function can itself be a really hard problem sometimes. That makes sense. You need a clear target and a way to measure progress automatically. Right. Looking ahead, though, there are some really intriguing possibilities. Like, can you take the insights, the optimized code patterns discovered by AlphaVolv, and somehow distill them back into future versions of the LLMs themselves? Making the base models inherently smarter about good code or efficient algorithms. Exactly. Or maybe integrating this kind of AI-driven code evolution directly into the tools and workflows that scientists and engineers use every day, making it a standard part of the discovery and optimization process. Hmm. This deep dive into AlphaVolv certainly leads. leaves us with a lot to think about. Here's a final thought for you to mull over. Consider this idea, an AI system that doesn't just follow instructions to solve a problem, but can actually discover the fundamental methods and algorithms for solving problems by writing and evolving complex code itself. What previously intractable scientific questions or maybe engineering challenges might suddenly become solvable when human experts are partnered with systems systems like this. Systems that can autonomously explore these vast spaces of potential solutions, finding efficiency gains, or maybe entirely novel approaches that we might never have conceived of on our own.