Heliox: Where Evidence Meets Empathy 🇨🇦‬

Join our hosts as they break down complex data into understandable insights, providing you with the knowledge to navigate our rapidly changing world. Tune in for a thoughtful, evidence-based discussion that bridges expert analysis with real-world implications, an SCZoomers Podcast

Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.

Curated, independent, moderated, timely, deep, gentle, evidenced-based, clinical & community information regarding COVID-19. Since 2017, it has focused on Covid since Feb 2020, with Multiple Stores per day, hence a sizeable searchable base of stories to date. More than 4000 stories on COVID-19 alone. Hundreds of stories on Climate Change.

Zoomers of the Sunshine Coast is a news organization with the advantages of deeply rooted connections within our local community, combined with a provincial, national and global following and exposure. In written form, audio, and video, we provide evidence-based and referenced stories interspersed with curated commentary, satire and humour. We reference where our stories come from and who wrote, published, and even inspired them. Using a social media platform means we have a much higher degree of interaction with our readers than conventional media and provides a significant amplification effect, positively. We expect the same courtesy of other media referencing our stories.

All Episodes

Heliox: Where Evidence Meets Empathy 🇨🇦‬

Centaur AI: Becoming Human

July 19, 2025 • by SC Zoomers • Season 5 • Episode 2

Send us a text

Read the article on Substack

The future is arriving faster than we can process its implications. Centaur isn't just a research breakthrough—it's a preview of coming attractions. And we're all the starring act in this particular show, whether we signed up for it or not.

The question isn't whether AI will learn to predict human behavior. It already has. The question is what we do with that knowledge, and whether we're prepared for a world where the mystery of the human mind is no longer quite so mysterious.

The mind machine is here. The only question is whether we're ready for what it reveals.

Source: A foundation model to predict and capture human cognition

This is Heliox: Where Evidence Meets Empathy

Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.

Thanks for listening today!

Four recurring narratives underlie every episode: boundary dissolution, adaptive complexity, embodied knowledge, and quantum-like uncertainty. These aren’t just philosophical musings but frameworks for understanding our modern world.

We hope you continue exploring our other podcasts, responding to the content, and checking out our related articles on the Heliox Podcast on Substack.

Support the show

About SCZoomers:

https://www.facebook.com/groups/1632045180447285
https://x.com/SCZoomers
https://mstdn.ca/@SCZoomers
https://bsky.app/profile/safety.bsky.app

Spoken word, short and sweet, with rhythm and a catchy beat.
http://tinyurl.com/stonefolksongs

Curated, independent, moderated, timely, deep, gentle, evidenced-based, clinical & community information regarding COVID-19. Since 2017, it has focused on Covid since Feb 2020, with Multiple Stores per day, hence a large searchable base of stories to date. More than 4000 stories on COVID-19 alone. Hundreds of stories on Climate Change.

Zoomers of the Sunshine Coast is a news organization with the advantages of deeply rooted connections within our local community, combined with a provincial, national and global following and exposure. In written form, audio, and video, we provide evidence-based and referenced stories interspersed with curated commentary, satire and humour. We reference where our stories come from and who wrote, published, and even inspired them. Using a social media platform means we have a much higher degree of interaction with our readers than conventional media and provides a significant amplification effect, positively. We expect the same courtesy of other media referencing our stories.

0:25

This is Heliox, where evidence meets empathy. Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe easy, we go deep and lightly surface the big ideas. Welcome to the deep dive. Today, we're plunging into something, well, pretty incredible, the human mind itself. I mean, think about it. One minute you're deciding what cereal to buy. The next you could be part of a team trying to cure a disease. Our ability to learn, to reason, adapt, it's just fundamental. But what if a computer model could actually predict that or even simulate the huge range of things we do? Yeah, that's exactly where we're headed today. We're doing a deep dive into Centaur. It's this really groundbreaking computational model from a recent Nature paper. And we're not just talking about an AI mimicking a few things. This feels like a genuine leap towards maybe a unified understanding of how we think. Right. So our mission today is to really unpack how Centaur was built, look at the, well, unprecedented data they used to train it, and explore the really surprising ways it seems to get human behavior, even down to how it aligns with our brain activity. We want to see how this could potentially revolutionize the way scientists actually study the mind. Okay, so let's start with the big challenge, this idea of a unified theory of cognition. We humans, we're so adaptable, right? We handle all sorts of things. But most computational models, whether you're looking at AI or cognitive science, they seem so focused. So domain specific. Why is that? It's a really common thing. You know, take AlphaGo, amazing at the game of Go, world class. But it can't make you breakfast or plan a holiday. It does one thing. Or in cognitive science, you have things like prospect theory, super insightful for financial decisions, maybe gambling choices. But it doesn't tell you anything about learning a language or, you know, navigating a tricky social situation. They're specialists. Okay, right. Let's unpack this because that intense focus really clashes with how we operate, doesn't it? Our minds are general purpose tools. And this idea that the need for a more unified general approach, that's not new, is it? People in your field have been talking about this for a while. Absolutely. For decades. Pioneers in the field recognized this need way back. I think there was a quote from 1990 saying unified theories are like the only way to bring our wonderful increasing fund of knowledge under intellectual control. So the thinking is, if you can build a computational model that predicts and simulates human behavior across lots of different situations, maybe any situation, that's a huge first step towards that grand unified theory. It gives you a framework. Okay, but that sounds huge. Predicting all human behavior. How do you even start to build a model like that? Is it just throwing massive computing power at it, or is there something more clever going on? Well, that's where Centaur comes in. And yeah, it's a pretty clever approach. They call it a foundation model of human cognition. Think of it like a general purpose starting point built on tons of data that you can then fine tune for specific cognitive tasks. And the base for it, it's actually a state of the art large language model, Meta AI's LAMA 3.170B. They took that really powerful base and essentially taught it to think more like a human. Right. And the teaching part that involved this data set you mentioned, Psych 101. That sounds really interesting. What exactly is Psych 101? Why is it so different? It's unprecedented, really, in terms of scale for this kind of work. Psych 101 is a massive data set. It has trial by trial data. So every single choice from 160 different psychological experiments. We're talking over 60,000 participants making more than 10 million decisions in total. And million decisions. Yeah. And the crucial part, the real innovation, was how they handled it. They transcribed every experiment into natural language. So even though the experiments were super different, testing, gambling, memory, learning, decision making, also they had this common format that was key. Okay. That common format makes sense for feeling it to a language model. Yeah. But how did they actually teach LLAMA using that data? Did it take like years of computation? Surprisingly, no. It was quite efficient. They used a technique called fine-tuning, specifically something called QLORA, which is parameter efficient. What that means is they only had to adjust a tiny fraction of the original LAMA model's parameters, something like 0.15%, just a small tweak, relatively speaking. And the whole training process took about five days on a single powerful GPU, an A100. So, yeah, targeted and efficient. Okay, so they built this model, trained it on this incredible human behavior data set. The big question then is, does it work? Can it actually do what they hoped? Let's start with prediction. Can it predict what people will do, people who've never encountered before in the training data? Yes, absolutely. And remarkably well. Across almost all of the 160 experiments, Centaur predicted the choices of these held out participants better than existing models and not just slightly better. It significantly outperformed both the base llama model it started from and also those traditional cognitive models that were specifically designed just for that one task. Okay, that's impressive prediction. But what about generating behavior? You know, not just predicting the next step based on what someone just did, but acting like a human from scratch. That seems like a much harder test, right? Sometimes called model falsification. Exactly. That's a much stronger test of whether the model truly understands the underlying processes. And yeah, in these open loop simulations where the model's own output becomes its next input, Centaur did really well. For instance, there's something called the horizon task, which looks at how people explore options. Centaur's performance was comparable to humans. And interestingly, it showed signs of uncertainty-guided directed exploration. That's exploring things specifically because you're not sure about them. It's a sophisticated human strategy that many other models just don't capture. And what's really fascinating there, you mentioned it doesn't just capture the average person. That's right. In another task, the two-step task, which is designed to tease apart different learning strategies. Well, humans show a lot of variety. Some people are purely model-free, just going for immediate rewards. Others are model-based, planning ahead. And many are a mix. Centaur actually generated the same diversity of learning trajectories. It wasn't just simulating one average response. It captured the whole distribution of human strategies. Which could have big implications down the line, maybe for things like personalized learning. Potentially, yes. understanding that individual variation is key. And there was another cool finding in a social prediction game. Centaur was pretty good at predicting what humans would do, about 64% accuracy. But when asked to predict what an artificial agent would do in the same game, its accuracy dropped way down to 35%. And that pattern, it perfectly mirrors how humans perform on that task. We're good at predicting other humans, less good at predicting simple artificial rules sometimes. Huh. So it even makes the same kind of mistakes we do. That does raise an interesting question, doesn't it? What does it mean for a model to be so human-like it shares our blind spots? It's a deep question. Okay. So it predicts, well, it generates diverse behavior. Yeah. But what about completely new situations, things it wasn't trained on at all? How does it handle out-of-distribution tests? That's the real test of generalization, isn't it? It is. And Centaur showed pretty impressive robustness here, too. They tried a few things. First, just changing the cover story. The training data had a task framed around spaceships. They tested Centaur on the exact same task structure, but described it using magic carpets. Centaur handled it fine, better than other models, suggesting it understood the core task, not just the surface story. Okay, so it's not just memorizing scenarios. What about changing the structure of the task itself? Right. They tested it on a task called Maggie's Farm, which is a three-armed bandit problem, you know, choosing between three slot machines. The key thing is the Psych 101 beta set only had two armed bandit tasks. So this was structurally novel. Centaur still managed to capture human behavior reasonably well, but a traditional specialized cognitive model designed for two armed bandits. It completely failed to generalize to the three-armed version. It just hit a wall. So the specialization really limited the old model there. Exactly. And maybe the most striking test was entirely new domains. The researchers deliberately excluded any studies on logical reasoning from SEC 101. Yet, after fine-tuning on all the other behavioral data, Centaur actually showed an improvement in predicting human performance on logical reasoning tasks, like ones based on the LSAT exam questions. Wow. So even though it never saw logic problems, learning about decision-making and memory somehow helped it understand logical reasoning better. It seems that way. The fine-tuning process seems to instill some more general cognitive abilities. So putting it all together, what's the takeaway on generalization? The takeaway is that Centaur consistently captured human behavior, not just in the trained tasks, but across six additional out-of-distribution tests. These included things like moral decision-making, economic games, areas where simpler models really struggled. It shows a remarkable level of robust, generalized understanding of human cognition. Okay, so it gives what we choose. But what about how quickly we choose? Our response times. That's another big part of human behavior, the timing of it all. Can Centaur predict that too? Yes, and this was another really strong result. They looked at Centaur's internal metrics, basically, how uncertain the model was about its own choice, which they call response entropy. And these internal signals were incredibly predictive of actual human response times. Centaur's entropy explained about 87% of the variance in how long people took to decide. 87%, that's huge. How does that compare? It's significantly better than the base llama model, which was around 75%, and also better than the traditional cognitive models, which were around 77%. So Centaur isn't just predicting the choice, it's capturing something about the cognitive effort or processing time involved. Okay, and now for the part that, for me, feels like it bridges into understanding the brain itself. You mentioned neural alignment. Can Centaur's internal workings actually reflect what's happening in our brains, even though it was never trained on brain scan data? This is one of the most fascinating findings, I think. Yes. They took fMRI data from humans doing some of these tasks, like the two-step task and even just reading sentences, and they found that Centaur's internal activation patterns, the way its artificial neurons fired, were significantly better at predicting the patterns of activity in the human brain compared to the base llama model, despite only being trained to match behavioral choices. So, just by learning to predict what we do, accurately and across many tasks, its internal structure started to look more like how our brains do it. That seems to be the implication. The fine-tuning on this rich behavioral data set pushed the model's internal representations towards something more neurally plausible, more aligned with human brain processes. And importantly, those traditional specialized cognitive models, they actually performed substantially worse at predicting neural activity compared to Centaur. It really highlights a potential limitation of those older approaches if the goal is neural understanding. Okay, this is clearly more than just a prediction tool then. How can Centaur and this Psych 101 dataset actually help scientists, like day-to-day researchers trying to understand the mind? What are the practical uses? Right. It's positioned as a tool for scientific discovery, not just an end product. There's a great case study in the paper about multi-attribute decision-making. They used a different AI model, one good at reasoning, to generate a kind of verbal explanation, a theory, for why people made certain choices in an experiment. Okay, so they got an initial hypothesis from one AI. Exactly. Then they used Centaur almost like a benchmark, a reference model of human behavior. They used a technique called scientific regret minimization, basically, finding where the initial AI's explanation didn't quite match up with Centaur's predictions, which are known to be very accurate. This process helped them pinpoint the flaws in the initial explanation and refine it. They ended up designing a new computational model that was not only interpretable, you could understand its steps, but also just as predictive as the complex centaur model itself. Wow. Okay. So that's like a blueprint, isn't it? Yeah. Using these powerful but maybe complex AI models to help us build better understandable scientific theories. Precisely. It's a way to leverage the predictive power of these foundation models to guide the development of interpretable cognitive science. That's potentially huge. And another idea they propose is using Centaur for in silico prototyping. In silico. Yeah. In the computer. Yeah. So before you spend time and money running experiments on actual humans, you could use Centaur to simulate the experiment. You could test different experimental designs to see which ones are likely to give you the biggest effect sizes or figure out how many participants you might need or estimate the statistical power, all simulated within the model first. That could be a massive time and resource saver for researchers. Imagine refining your study design before you even recruit participant one. That's a potential game changer for planning research. It really could be. So this sounds amazing, but it's clearly just the beginning. What's next? Where do they see Centaur and Psych 101 going from here? How do they build on this? Well, developing Psych 101 is definitely an ongoing project. The plan is to keep expanding it, adding more cognitive domains. They mention wanting to include things like psycholinguistics, how we process language, and more social psychology experiments. And a really key goal for the future is to incorporate individual differences into the data set. Ah, so not just how people behave on average, but how different people behave differently. Exactly. Things like age, maybe personality traits, socioeconomic background. They want to capture that variability. They're very aware that the current data set, like a lot of psychology research, frankly, is biased towards what are called weird populations. Yeah, it stands for Western, educated, industrialized, rich and democratic. There's a known bias in the field, and they explicitly state a goal to broaden the demographics represented in Psych 101 to make it more globally representative. And longer term, they're thinking about moving beyond just text towards a multimodal data format, maybe incorporating visual information or speech data to get an even richer picture of behavior. That makes sense. Now, the paper touches on something interesting, this historical worry in cognitive science about a big unified model being seen as sort of an intruder, maybe overshadowing specialized research. How does Centaur try to address that? Yeah, there's definitely been skepticism about grand unified theories in the past, partly because they can be hard to rigorously compare to specialized models. So years ago, this idea of a cognitive decathlon was proposed, like an Olympic decathlon, but for cognitive models. The idea was to test competing models across a whole battery of different experiments and see which one performed best overall, cumulatively. What the Centaur paper argues is that their evaluation, testing against established models across 160 experiments, is essentially like running 16 of these cognitive decathlons simultaneously. And Centaur basically won them all. Pretty much, yes. It consistently outperformed the specialized models across the board. They argue this provides really strong evidence that this data-driven foundation model approach is incredibly promising for discovering these domain-general cognitive models. And the ultimate goal, the next big step they suggest, is to take this powerful computational model and work towards translating it into a proper, fleshed-out, unified theory of human cognition. What an incredible arc. We've gone from the challenge of modeling the mind to this centaur model that predicts choices, generates diverse behaviors, generalizes to new situations, even mirrors our brain activity, and can actually be used as a tool to help scientists build better theories themselves. It really does feel like a significant moment, this convergence of large-scale AI and deep cognitive science questions. It might offer, maybe not a shortcut, but a powerful new path towards understanding ourselves. So, thinking about this, what does it all mean for you, listening to it right now? If a model like Centaur can learn the rules of our behavior so well, even down to predicting brain responses, what does that open up? What new ways might we understand our own decisions, how we learn, maybe even consciousness itself? What could a model like this reveal about how your own mind works? Maybe think about a complex decision you made recently. If Centaur had been watching, what might it have predicted? What underlying processes might it have pointed to? It's fascinating food for thought. Thanks for listening today. Four recurring narratives underlie every episode. Boundary dissolution, adaptive complexity, embodied knowledge, and quantum-like uncertainty. These aren't just philosophical musings, but frameworks for understanding our modern world. We hope you continue exploring our other podcasts, responding to the content, and checking out our related articles at heliocspodcast.substack.com.