Heliox: Where Evidence Meets Empathy 🇨🇦‬

🕸️ Why AI's Next Breakthrough Isn't About More Data: Multifidelity Kolmogorov–Arnold Networks:

• by SC Zoomers • Season 5 • Episode 25

Send us a text

Please see the corresponding Substack episode

🧠 Just discovered AI that learns from "imperfect" data—like us humans do. Turns out the future isn't about perfect info, but smart partnerships. 🤖✨

The future of scientific AI isn't about feeding machines more perfect information. It's about teaching them to be better partners in the messy, uncertain, gloriously imperfect process of understanding our world. And that future, thankfully, doesn't require us to wait for perfect data that may never come.

Sometimes the most profound revolutions are the quiet ones—the ones that work with what we have rather than demanding what we wish we had.

Multifidelity Kolmogorov–Arnold networks

Amanda A Howard*, Bruno Jacob and Panos Stinis

© 2025 Battelle Memorial Institute 

This is Heliox: Where Evidence Meets Empathy

Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter.  Breathe Easy, we go deep and lightly surface the big ideas.

Thanks for listening today!

Four recurring narratives underlie every episode: boundary dissolution, adaptive complexity, embodied knowledge, and quantum-like uncertainty. These aren’t just philosophical musings but frameworks for understanding our modern world. 

We hope you continue exploring our other podcasts, responding to the content, and checking out our related articles on the Heliox Podcast on Substack

Support the show

About SCZoomers:

https://www.facebook.com/groups/1632045180447285
https://x.com/SCZoomers
https://mstdn.ca/@SCZoomers
https://bsky.app/profile/safety.bsky.app


Spoken word, short and sweet, with rhythm and a catchy beat.
http://tinyurl.com/stonefolksongs

Curated, independent, moderated, timely, deep, gentle, evidenced-based, clinical & community information regarding COVID-19. Since 2017, it has focused on Covid since Feb 2020, with Multiple Stores per day, hence a large searchable base of stories to date. More than 4000 stories on COVID-19 alone. Hundreds of stories on Climate Change.

Zoomers of the Sunshine Coast is a news organization with the advantages of deeply rooted connections within our local community, combined with a provincial, national and global following and exposure. In written form, audio, and video, we provide evidence-based and referenced stories interspersed with curated commentary, satire and humour. We reference where our stories come from and who wrote, published, and even inspired them. Using a social media platform means we have a much higher degree of interaction with our readers than conventional media and provides a significant amplification effect, positively. We expect the same courtesy of other media referencing our stories.


Welcome to the Deep Dive, the show where we cut through the noise and get straight to the insights that matter. Today, we're diving into a challenge that, well, it plagues nearly every scientific discipline, working with data. Oh, yeah, definitely. Sometimes you have oceans of it, right? But it's low quality, maybe fuzzy or inconsistent. Other times, the high quality data you desperately need is like gold dust. incredibly sparse, prohibitively expensive, or just plain slow to acquire. Yep. The good stuff is hard to come by. It's a constant tightrope walk, isn't it, between quantity and quality. You're often left wondering, how do you possibly make robust, accurate predictions without that perfect, complete data set? It's a huge bottleneck in so many fields. That's the core problem we're tackling today. We're exploring a pretty groundbreaking development that promises to be a kind of shortcut maybe to robust, accurate predictions, even with that tricky, expensive data. It's an approach that combines a relatively new, highly interpretable type of neural network with a very smart strategy for handling diverse data sources. Our mission today is really to unpack how these pieces fit together, how they solve some of the most persistent hurdles in scientific modeling. And this isn't just about making models a little bit better. I think that's important to stress. Okay. What we're about to discuss could fundamentally reshape how we approach scientific discovery itself. Wow. Imagine revolutionizing fields from understanding the intricate dance of fluid dynamics to precisely predicting the behavior of normal materials and even pushing the very boundaries of what AI can achieve without those massive problems. perfect data sets that, you know, have historically been the backbone of deep learning. Right. The huge data sets we always hear about. Exactly. This development truly has the potential to unlock new insights and dramatically accelerate research across countless disciplines. Okay, let's unpack this core problem then, because I think so many of you listening, especially if you work in scientific machine learning or a SciML, have probably encountered this. Almost certainly. Traditional SciML models, often built on what we call multilayer perceptrons, or MLPs. That's pretty much your standard neural network. Yeah, who workhorse. These demand large amounts of high-quality data to perform well. But, as we just touched on, what happens when that data is astronomically expensive to acquire? Or incredibly difficult to measure? Or, let's be honest, just plain noisy? Which is often the case in the real world. Totally. Think about trying to simulate a new material's properties at extreme conditions. Or model a complex biological system where every single observation costs a fortune. Getting perfect data is, well, it's a luxury, not a given. Absolutely. Now, into this challenging landscape, a fascinating alternative has recently emerged. Yeah. Kolmogorov-Arnold networks or CAN. CANs, yeah. They've been making waves. They're inspired by the Kolmogorov-Arnold theorem itself, which is Pretty profound piece of mathematics. And they promise several compelling advantages. For one, they offer better interpretability. That's a big one. Because they represent functions not through those abstract weights and biases like traditional MLPs. The black box problem. Exactly. but through polynomial splines. Think of splines like flexible, connectable curve segments, almost like a sophisticated digital French curve you can use to draw any shape smoothly. That's a good analogy. This allows Can-An's to model complex relationships in a way that's much more intuitive to visualize and understand than a traditional neural network's internal workings. You can actually see the function taking shape. Right. And this structure can even lead to what's called symbolic regression, essentially. The network doesn't just predict an outcome. It could potentially discover and output the underlying mathematical equations that govern a system. Which is huge. Imagine an AI not just predicting a chemical reaction rate, but actually giving you the new rate law equation. Or a physicist seeing their complex simulation distilled down to a fundamental new formula. That's the truly revolutionary insight symbolic regression offers. That would be game changing. Totally. Plus, Kanans also boast impressive expressivity and in some cases can achieve higher accuracy with significantly smaller network sizes compared to MLP. So potentially more efficient, too. Potentially, yeah. What's truly fascinating here, though, is that we're looking at a fundamental limitation in how we model the physical world. Consider trying to predict something as intricate as, say, rapidly changing weather patterns. Or the stress distribution in a novel material when you only have a handful of reliable observations. Right. Very sparse data. Traditional models in such scenarios often either overfit to that limited data, making them unreliable. We learn the noise, basically. Or they simply fail to generalize to new unseen conditions. So this raises an important question for you, our listener. How do we build robust, accurate models that can generalize effectively when the perfect or complete data set is simply out of reach? Yeah. How do we stop that frustration of knowing there's a pattern but just not having enough dots to connect? This is the core challenge MFKNs aim to address. Exactly. How do you bridge that gap? The answer, it turns out, isn't always just to brute force your way to more high-quality data. Sometimes it's about being much smarter with the data you do have. Working smarter, not harder. Precisely. This brings us to the concept of multifidelity machine learning. Okay. At its heart, multifidelity means intelligently combining two or more data sets of different qualities, different levels of fidelity to train a network more accurately than you could by using any single data set alone. Right. Leveraging everything you've got. Exactly. We distinguish between two main types of data here. Exactly. First, there's low fidelity, or LF, data. This is the cheaper stuff, easier or faster to generate. Like from a simplified simulation model, maybe? Perfect example. Or measurements from a coarser sensor mesh. You could typically get a lot of this kind of data, but it's inherently less accurate. Got it. Lots of it, but fuzzy. Then you have high fidelity, or HF data. This is your gold standard. Highly accurate, but...- Expensive.- Yeah, expensive, slow, difficult to obtain. So you usually have much smaller quantities of it.- Makes sense. Few precise points.- Now the inspiration for this new development comes from a pretty robust framework called Composite Multi-Fidelity Neural Networks.- Okay, so that's the foundation.- Yeah. This approach trains separate networks for both the low and high fidelity data. But crucially, the high fidelity network doesn't just start from scratch. It leverages the output of the low fidelity network as a kind of starting point or a prior guess. Ah, okay. So it gets a hint from the cheaper data. Exactly. It's almost like giving a student a good summary from a simpler textbook. before asking them to tackle the advanced material. Right, they have a foundation. They already have a foundational understanding, so they can focus on refining the details. It explicitly models the correlation between the low and high fidelity data, assuming there's a strong underlying relationship. And importantly, it doesn't even require the data sets to be like perfectly nested, right? The HF points don't need to be a subset of the LF. points that's a great point yeah adds flexibility if we connect this to the bigger picture this is really about intelligently leveraging all available information rather than just discarding data because it's not perfect right don't throw away the fuzzy picture exactly it's like having a rough but comprehensive sketch that's your low fidelity data combined with a few extremely precise details representing your high fidelity observations So finding the simple connection first, then the complex bits. Pretty much. It's about extracting maximum value from every piece of information, seeing the wisdom in the good enough data. And that concept, intelligently leveraging all available data, disentangling relationships, that's the foundation. But what happens when you combine that brilliant strategy with the unique interpretability and efficiency of Kolmogorov-Arnold networks? Ah, now we get to the MF Canyons. Exactly. That's the breakthrough researchers have achieved with multifidelity Kolmogorov-Arnold networks, or MF Canyons. This is where the innovation truly shines. By integrating the unique properties of cans into that powerful composite multifidelity framework... They've managed a significant reduction in the amount of expensive high-fidelity data needed, while still delivering exceptionally accurate and robust predictions. So less expensive data, but still good results. That's the promise. That's the core promise. The MF CAN architecture itself consists of three key blocks, each essentially a specialized CAN working together in concert. Right, a modular approach. Yeah. First, there's a low-fidelity CAN, or KL. This block is either pre-trained using the low-fidelity data, or, and this is cool, it can even directly use an existing numerical model to understand that low-fidelity information. Right. Oh, interesting. So it doesn't have to be a trained can? Nope. Once its knowledge is established, its parameters are largely fixed, frozen. Got it. The foundation layer. Then we have a linear can, or KL. This block's job is to learn the linear relationship between the output of that low-fidelity KL block and the actual high-fidelity data. Just the linear part. Just the linear part. And it's deliberately kept very simple. Think polynomial degree one, just two grid points, no hidden layers. It's designed to find only the most straightforward, direct linear connection. Okay, why so simple? The idea is to intentionally keep it minimalist so it doesn't get distracted by noise or try to invent complex patterns when it only has a few valuable high fidelity data points to learn from. Ah, okay. Avoids overfitting the sparse HF data. Makes sense. Exactly. Finally, there's a nonlinear KN or KNL. This block then learns the nonlinear corrections to that primary linear relationship. So this captures the more complex stuff the linear one missed. Precisely. It accounts for the more complex discrepancies or nuances between the low and high fidelity data that a simple linear model just can't capture. The final high-fidelity prediction is then a clever combination of the outputs from both the linear KL and nonlinear KNL networks. There's also a smart simplification penalization term that essentially nudges the model to first find the most straightforward linear relationship. Right. Prioritize the simple explanation. Yeah, it's vital for preventing overfitting, especially with very sparse high fidelity data. It's like telling the model, "Look, don't try to invent complex explanations unless you have overwhelming evidence. Start with the simplest connection first." What's truly ingenious here is the synergy created by this modular design. By breaking down the problem into learning a foundational low fidelity model and then explicitly disentangling the linear and nonlinear correlations MFKNs gain several distinct advantages. Well compared to traditional multi fidelity MLPs MFKNs offer enhanced interpretability as we discussed before. Right, the splines. Exactly. Because their activation functions are based on splines, they have the potential to learn symbolic representations of these complex relationships. The equations again. Yep. This isn't just about predicting an outcome. It can lead to a much deeper understanding of the underlying scientific processes at play moving beyond a mere black box. So it's not just prediction, it's potential understanding. Exactly. Imagine an AI model not just forecasting a flood, but actually helping you derive the underlying hydrological equations for that specific river basin. Furthermore, MF CANs can often achieve comparable or even superior accuracy with significantly fewer trainable parameters. Fewer parameters? That's a big deal for efficiency. huge benefit, particularly when dealing with noisy or extremely sparse data. It reduces computational cost and makes them more deployable in practice. That synergy sounds incredibly compelling, almost like a silver bullet for these data challenges. But for a listener who's maybe struggled with training cans in the past, or even physics-informed cans, which can be yeah yeah they can be what's the biggest gotcha with MF cans is there a new challenge that comes with this modularity or are there scenarios where maybe a traditional multi fidelity MLP might still be preferred that's a very fair question yeah well MF cans offer significant advantages I'd say a common pitfall is ensuring that the low fidelity can the KL block is sufficiently well trained or accurate enough if it's a direct model so the foundation needs to be solid Exactly. If your initial rough sketch is wildly inaccurate, it can make it much harder for the linear and nonlinear cans to make the necessary corrections efficiently. You're asking them to bridge too big a gap. Right. So investing in a reasonably robust low fidelity model remains critical. Also, while cans are generally great for smooth functions, very sharp, highly discontinuous data like abrupt jumps can still pose challenges, even with splines. Even with MF cans. Well, MF cans handle it better than single fidelity cans, as we'll see, but it's not a complete magic wand for every single pathological data set out there. Sometimes, depending on the exact nature of the nonlinearity or discontinuity, a well-tuned MLP might still compete. It's always about finding the right tool for the specific problem. Okay, that makes perfect sense. Garbage in, garbage out still applies, at least to the low-fidelity side. So with that context, how do MF CANs actually perform in practice? The research paper details several test cases that really highlight their power and versatility, directly addressing those tricky data problems we talked about earlier. Yeah, the examples are quite compelling. Let's look at some. In test one, the jump function, they looked at predicting a function with a sharp jump using incredibly sparse high-fidelity data, just five data points. Five points! That's almost nothing. I know. A single fidelity CAN, or MLP, really struggles here, leading to huge errors because they simply don't have enough information to map that sudden change. They just smooth over it, presumably? Pretty much. The MF CAN, however, despite the inherent challenge of jump functions, for spline-based cans was able to accurately capture that jump. Wow, even with only five HF points. Yep, it significantly outperformed a single fidelity can and adding more low fidelity data further reduced the error. This powerfully demonstrates how MF cans dramatically mitigate the impact of limited high fidelity data. That's impressive. It shows the LF data really guiding it. Exactly. Then there's test four, assessing robustness to noise in high dimensions. Here, they predicted a complex four-dimensional function, and they intentionally added Gaussian white noise to both the low and high-fidelity training data. Okay, so real-world messiness. Definitely. Single-fidelity models, whether CAN or MLP, had very large relative errors and overfit the sparse high-fidelity data badly. basically learning the noise instead of the actual signal. As you'd expect with noisy, sparse data. Right. The MFK-HAN, in stark contrast, produced predictions that were much more accurate, even with all that noise present. This showcased its superior robustness. And what's more, when they compared it to multi-fidelity MLPs with a similar number of trainable parameters, MFK-HANs performed significantly better when faced with noisy data. That's a common nemesis in real-world applications. That is significant. noise robustness is key. Then test five focused on unlocking physics-informed training. We mentioned physics-informed CANs, P-CANS, can be tough to train accurately. notoriously so, yeah. Especially for complex PDEs. Right, things like the Poisson equation. Yeah. They often require really careful hyperparameter tuning. The MF CAN solution involved using physics-informed training enforcing the physical laws at both the low and high fidelity levels. Oh, okay. Applying the physics constraints to both data qualities. Exactly. This dramatically improved accuracy compared to a single fidelity PECAN. It achieved comparable accuracy to highly parameterized MLPs, but with three times fewer trainable parameters. Three times fewer. That's a huge efficiency gain. Massive. It means potentially solving complex physics problems far more efficiently, using less computational power, and maybe reaching solutions that were previously just out of reach for PECANs. That could open up a lot of possibilities in computational science. Absolutely. And finally, test seven addressed what many consider the holy grail in machine learning, extrapolation. Ah, yes. Predicting outside the training zone, the big challenge. The big one. Neural networks notoriously struggle with making predictions outside the range of their training Imagine trying to predict future time steps for a fluid flow simulation when your high-fidelity data only covers the early stages. Yeah, crucial for building predictive digital twins. Exactly. This is a major hurdle for creating those digital twins of dynamic systems, where you need to project behavior far into the unknown. data. It's like trying to predict the entire trajectory of a thrown ball just by seeing its first few inches of movement. impossible for most models right they learn the scene data not the underlying physics usually so for a fluid flow simulation past a cylinder the MFK and was trained with high fidelity data only up to a hundred time steps they then asked it to predict all the way up to 200 time steps okay doubling the prediction horizon into the unknown yep the single fidelity models error jumped rapidly and it completely failed to capture the intricate vortex shutting structure downstream it just fell apart predictable failure unfortunately well but in stark contrast the MF cans error increased only slightly and it remained accurate even far beyond its high fidelity training range Wow it actually generalized in time it really did and get this Here, the low-fidelity input actually came directly from a numerical solver bypassing a pre-trained CAN altogether. Whoa, okay, so you don't even need the KL block to be a CAN. You can feed in results from a cheaper simulation directly. Exactly, which further highlights the immense flexibility of the framework. You can mix and match components. This raises an important question for you listening. What stands out most from these examples? For me, honestly, the ability to extrapolate with such accuracy, particularly in that fluid flow example, that is truly groundbreaking. Yeah, that one really caught my eye too. It suggests a clear path toward building AI models that can predict the future behavior of complex physical systems far more reliably. and with significantly less dependence on continuous, expensive, high-fidelity data across the entire domain of interest. Right. You don't need the gold standard data for the whole time period. Exactly. This means potentially faster, cheaper scientific discovery and engineering, enabling simulations and predictions that were previously impossible or just impractical due to DEREC constraints. It hints at a future where our models don't just mimic what they've seen, but genuinely understand enough of the underlying dynamics to venture into the unknown. So wrapping this up, what does this all mean for the broader world of scientific machine learning? It really feels like MFKNs offer a compelling alternative, addressing many of those practical limitations of deploying AI in real-world scientific contexts. I think it's fair to say. To recap, they dramatically reduce the need for those huge, expensive, high-fidelity data sets. That's point one. Big cost saving. Point two. They provide robust predictions even when data is noisy, which, let's face it, is common in experimental science. Mm-hmm. Real-world readiness. And critically, point three. They enhance the interpretability of models through CAN's unique spline-based functions, offering potential insights into the relationships, not just black box predictions. Moving towards understanding, not just predicting. Plus, as we saw, their architecture is highly adaptable. You can swap different CAN variants or even MOPs into its different blocks for optimal performance, depending on the specific problem. That flexibility is a key engineering advantage. Looking to the future, the researchers seem incredibly optimistic about further developments. like adaptive grid refinement within cans to make them even more efficient. Sharpening the splines where needed. Yeah. And the immense potential for MF cans to be used for symbolic regression. That idea of directly learning the underlying mathematical equations governing complex relationships from the data itself. Yeah. That ability to automatically uncover fundamental laws. It could unlock scientific breakthroughs we can barely imagine right now. Absolutely. This could fundamentally change how scientists and engineers actually use AI tools day to day. Imagine not just predicting outcomes, but automatically uncovering the equations that describe physical phenomena directly from data, even when you have only limited high fidelity observations. From prediction engine to discovery engine. Precisely. This method moves us significantly closer to an AI that not only thaws problems, but also genuinely helps us understand the fundamental principles behind them. It fosters critical thinking, deeper scientific insights, truly enriching our understanding of the universe. It shifts AI from being just a predictor to becoming potentially a true scientific partner. What a deep dive into MFKNs. It's really clear that this method is poised to make a significant impact on how we approach scientific modeling, making powerful AI tools more accessible, more reliable, and ultimately more insightful. This combination of interpretability and data efficiency truly feels like a leap forward. Indeed, you've seen how intelligently combining different levels of data fidelity with the unique architecture of Kolmogorov-Arnold's networks, can overcome some really significant hurdles, from the challenges of sparse and noisy data to that ever-present problem of extrapolation. It's a testament, I think, to thinking smarter about both the data we have and the models we build. So for you, our curious listener, maybe consider this. How might the ability of MF cans to extrapolate accurately beyond known high-fidelity data transform fields like climate modeling, drug discovery, or materials design in the coming years?

Speaker 3:

What previously impossible predictions could now actually become within reach thanks to this innovative approach, allowing us to perhaps see further into the unknown than ever before?

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.