Everyone Uses P-Values… But They Fail When the Question Is Causal Artwork

Data Science x Public Health

This podcast discusses the concepts of data science and public health, and then delves into their intersection, exploring the connection between the two fields in greater detail.

All Episodes

Data Science x Public Health

Everyone Uses P-Values… But They Fail When the Question Is Causal

March 30, 2026 • BJANALYTICS

0:00 | 5:21

P-values are everywhere in research. They are treated as the standard for determining whether a result is real, meaningful, or worth acting on. But what if statistical significance is answering the wrong question?

In this episode, we break down why p-values often fail when the real goal is causal inference. You will learn what a p-value actually measures, why it cannot establish causality, and how study design, confounding, and bias matter far more than a single threshold like 0.05.

👉 Enjoyed the episode? Follow the show to get new episodes automatically.

If you found the content helpful, consider leaving a rating or review—it helps support the podcast.

For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com

Youtube: https://www.youtube.com/@BJANALYTICS

Instagram: https://www.instagram.com/bjanalyticsconsulting/

Twitter/X: https://x.com/BJANALYTICS

Threads: https://www.threads.com/@bjanalyticsconsulting

SPEAKER_00 0:00

We've uh we've all seen those headlines. Declaring a new scientific breakthrough is, you know, statistically significant. It sounds definitive, right? Like a certified stamp of absolute truth. But when you look under the hood of scientific research, that comforting certainty gets remarkably murky. So welcome to today's deep dives into excerpts from beyond significance. Rethinking causality in scientific research. Our mission today is to help you decode why the ultimate stamp of scientific validity, the famous p-value, often completely fails to answer the most important question of all.

SPEAKER_01 0:35

Right, which is did an intervention actually cause the outcome?

SPEAKER_00 0:39

Exactly. Did it actually cause it?

SPEAKER_01 0:40

Before we look at how science is moving forward to fix this, I mean we really need to understand how we got stuck in the first place.

SPEAKER_00 0:46

Okay, let's unpack this because almost every scientific paper you read has a p-value. I uh I like to think of it kind of like a metal detector. It confidently beeps to tell you that something is there under the sand, but it absolutely cannot tell you if you just found a priceless gold coin or, you know, a rusty nail.

SPEAKER_01 1:03

What's fascinating here is the massive disconnect between what people assume that beep means and the underlying math. Because a p-value is not the probability that your hypothesis is true.

SPEAKER_00 1:13

Aaron Powell Wait, really? Because I feel like that's how everyone interprets it.

SPEAKER_01 1:16

Oh, exactly. Everyone thinks that. But it is not proof that an intervention works. All a p-value tells you is the probability of observing data this extreme, assuming the null hypothesis is completely true.

SPEAKER_00 1:28

Aaron Powell So assuming the baseline scenario where your intervention had zero actual effect.

SPEAKER_01 1:33

Aaron Powell Yes, exactly. It's just measuring how surprising the data is if there were no real effect at all.

SPEAKER_00 1:39

Aaron Ross Powell Wow. Okay, so if a p-value only measures data extremeness, how did it become the ultimate threshold for scientific truth?

SPEAKER_01 1:46

Aaron Powell Well, over time it kind of devolved into a cultural shortcut. A single numerical threshold, you know, usually 0.05, started functioning like a rigid boundary line.

SPEAKER_00 1:55

Like a pass-fail grade.

SPEAKER_01 1:56

Right. Getting a p-value under 0.05 meant your results were significant and therefore publishable. Over that line meant your research was ignored.

SPEAKER_00 2:04

Aaron Powell I mean that simple binary completely mask the much harder work of actual causal reasoning. But uh I have to push back a little here. Doesn't a really, really tiny p-value at least guarantee that you've found a meaningful effect? Like if the metal detector is absolutely screaming, there has to be something substantial down there.

SPEAKER_01 2:24

Aaron Powell You'd think so, but no. A massive sample size can make an entirely trivial, meaningless effect look statistically significant just through sheer volume of data alone.

SPEAKER_00 2:35

Aaron Powell Oh, I see.

SPEAKER_01 2:36

And conversely, a highly meaningful effect can fail to reach that 0.05 significance if the study is noisy or you know just doesn't have enough participants. Trevor Burrus, Jr.

SPEAKER_00 2:46

So it's heavily dependent on the sample size.

SPEAKER_01 2:48

Aaron Powell Exactly. And you can even generate a beautiful tiny p-value from an observational study that is fundamentally biased by outside variables.

SPEAKER_00 2:56

Right. So the math XUs perfectly, but the study design is just answering the wrong question entirely.

SPEAKER_01 3:00

Aaron Powell Precisely.

SPEAKER_00 3:01

So what does this all mean? Because this illusion of causation isn't just trapped in academic papers, right? It's bleeding into the algorithms running our hospitals right now.

SPEAKER_01 3:09

Oh, absolutely. It's a huge issue in data science.

SPEAKER_00 3:12

Aaron Powell Data scientists are out there optimizing predictive models, like, say, an AI that predicts hospital readmission risks, and they're mistakenly treating those models as causal solutions.

SPEAKER_01 3:24

If we connect this to the bigger picture, a model might brilliantly predict who will be readmitted or where an infection hotspot will flare up, but that prediction does not tell a doctor what treatment will actually change those outcomes.

SPEAKER_00 3:36

Because predicting an event isn't the same as knowing how to stop it.

SPEAKER_01 3:40

Exactly. Prediction and causation are related, but treating a predictive AI like a causal map is a fast track to building elegant models that fail completely in practice. Fixing this requires moving past the p-value obsession.

SPEAKER_00 3:52

And focusing on the mechanics of the study design itself.

SPEAKER_01 3:56

Right. Our sources point to explicit causal assumptions, specifically tools like directed acyclic graphs or DAGs.

SPEAKER_00 4:04

Okay, so rather than just dumping data into a computer and hoping for a good p-value, a DAG forces researchers to literally draw a visual map of how they think different variables interact.

SPEAKER_01 4:15

Yes, it forces you to map out the how and the why. Drawing a DAG exposes hidden factors. Say, for example, how a patient's income level might secretly be affecting both their diet and their health outcomes.

SPEAKER_00 4:28

Oh, so you have to explicitly account for those hidden biases before you ever even calculate a p-value.

SPEAKER_01 4:33

Exactly. It forces researchers to show their structural work up front.

SPEAKER_00 4:37

Aaron Powell That's huge. So for you listening, the next time you see a headline screaming about a statistically significant breakthrough, you really need to look past that label.

SPEAKER_01 4:46

Aaron Powell You have to ask yourself: does the study design actually prove causation, or did they just find a loud beep?

SPEAKER_00 4:52

Aaron Powell Because a tiny p-value can never compensate for a weak design.

SPEAKER_01 4:56

It really can.

SPEAKER_00 4:57

Which leaves us with a rather provocative thought from our sources today. We know predictive models can identify risks. If our health agencies are increasingly relying on dashboards that merely predict outcomes, like overdose risks or hospital readmissions, how much of our public health funding is currently being spent treating mere correlations instead of the actual causes?

SPEAKER_01 5:16

Aaron Powell Yeah. Are we just blindly digging up rusty nails because the detector beeped?