Data Science x Public Health
This podcast discusses the concepts of data science and public health, and then delves into their intersection, exploring the connection between the two fields in greater detail.
Data Science x Public Health
Everyone Uses P-Values… But They Fail When the Question Is Causal
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
P-values are everywhere in research. They are treated as the standard for determining whether a result is real, meaningful, or worth acting on. But what if statistical significance is answering the wrong question?
In this episode, we break down why p-values often fail when the real goal is causal inference. You will learn what a p-value actually measures, why it cannot establish causality, and how study design, confounding, and bias matter far more than a single threshold like 0.05.
👉 Enjoyed the episode? Follow the show to get new episodes automatically.
If you found the content helpful, consider leaving a rating or review—it helps support the podcast.
For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com
Youtube: https://www.youtube.com/@BJANALYTICS
Instagram: https://www.instagram.com/bjanalyticsconsulting/
Twitter/X: https://x.com/BJANALYTICS
We've uh we've all seen those headlines. Declaring a new scientific breakthrough is, you know, statistically significant. It sounds definitive, right? Like a certified stamp of absolute truth. But when you look under the hood of scientific research, that comforting certainty gets remarkably murky. So welcome to today's deep dives into excerpts from beyond significance. Rethinking causality in scientific research. Our mission today is to help you decode why the ultimate stamp of scientific validity, the famous p-value, often completely fails to answer the most important question of all.
SPEAKER_01Right, which is did an intervention actually cause the outcome?
SPEAKER_00Exactly. Did it actually cause it?
SPEAKER_01Before we look at how science is moving forward to fix this, I mean we really need to understand how we got stuck in the first place.
SPEAKER_00Okay, let's unpack this because almost every scientific paper you read has a p-value. I uh I like to think of it kind of like a metal detector. It confidently beeps to tell you that something is there under the sand, but it absolutely cannot tell you if you just found a priceless gold coin or, you know, a rusty nail.
SPEAKER_01What's fascinating here is the massive disconnect between what people assume that beep means and the underlying math. Because a p-value is not the probability that your hypothesis is true.
SPEAKER_00Aaron Powell Wait, really? Because I feel like that's how everyone interprets it.
SPEAKER_01Oh, exactly. Everyone thinks that. But it is not proof that an intervention works. All a p-value tells you is the probability of observing data this extreme, assuming the null hypothesis is completely true.
SPEAKER_00Aaron Powell So assuming the baseline scenario where your intervention had zero actual effect.
SPEAKER_01Aaron Powell Yes, exactly. It's just measuring how surprising the data is if there were no real effect at all.
SPEAKER_00Aaron Ross Powell Wow. Okay, so if a p-value only measures data extremeness, how did it become the ultimate threshold for scientific truth?
SPEAKER_01Aaron Powell Well, over time it kind of devolved into a cultural shortcut. A single numerical threshold, you know, usually 0.05, started functioning like a rigid boundary line.
SPEAKER_00Like a pass-fail grade.
SPEAKER_01Right. Getting a p-value under 0.05 meant your results were significant and therefore publishable. Over that line meant your research was ignored.
SPEAKER_00Aaron Powell I mean that simple binary completely mask the much harder work of actual causal reasoning. But uh I have to push back a little here. Doesn't a really, really tiny p-value at least guarantee that you've found a meaningful effect? Like if the metal detector is absolutely screaming, there has to be something substantial down there.
SPEAKER_01Aaron Powell You'd think so, but no. A massive sample size can make an entirely trivial, meaningless effect look statistically significant just through sheer volume of data alone.
SPEAKER_00Aaron Powell Oh, I see.
SPEAKER_01And conversely, a highly meaningful effect can fail to reach that 0.05 significance if the study is noisy or you know just doesn't have enough participants. Trevor Burrus, Jr.
SPEAKER_00So it's heavily dependent on the sample size.
SPEAKER_01Aaron Powell Exactly. And you can even generate a beautiful tiny p-value from an observational study that is fundamentally biased by outside variables.
SPEAKER_00Right. So the math XUs perfectly, but the study design is just answering the wrong question entirely.
SPEAKER_01Aaron Powell Precisely.
SPEAKER_00So what does this all mean? Because this illusion of causation isn't just trapped in academic papers, right? It's bleeding into the algorithms running our hospitals right now.
SPEAKER_01Oh, absolutely. It's a huge issue in data science.
SPEAKER_00Aaron Powell Data scientists are out there optimizing predictive models, like, say, an AI that predicts hospital readmission risks, and they're mistakenly treating those models as causal solutions.
SPEAKER_01If we connect this to the bigger picture, a model might brilliantly predict who will be readmitted or where an infection hotspot will flare up, but that prediction does not tell a doctor what treatment will actually change those outcomes.
SPEAKER_00Because predicting an event isn't the same as knowing how to stop it.
SPEAKER_01Exactly. Prediction and causation are related, but treating a predictive AI like a causal map is a fast track to building elegant models that fail completely in practice. Fixing this requires moving past the p-value obsession.
SPEAKER_00And focusing on the mechanics of the study design itself.
SPEAKER_01Right. Our sources point to explicit causal assumptions, specifically tools like directed acyclic graphs or DAGs.
SPEAKER_00Okay, so rather than just dumping data into a computer and hoping for a good p-value, a DAG forces researchers to literally draw a visual map of how they think different variables interact.
SPEAKER_01Yes, it forces you to map out the how and the why. Drawing a DAG exposes hidden factors. Say, for example, how a patient's income level might secretly be affecting both their diet and their health outcomes.
SPEAKER_00Oh, so you have to explicitly account for those hidden biases before you ever even calculate a p-value.
SPEAKER_01Exactly. It forces researchers to show their structural work up front.
SPEAKER_00Aaron Powell That's huge. So for you listening, the next time you see a headline screaming about a statistically significant breakthrough, you really need to look past that label.
SPEAKER_01Aaron Powell You have to ask yourself: does the study design actually prove causation, or did they just find a loud beep?
SPEAKER_00Aaron Powell Because a tiny p-value can never compensate for a weak design.
SPEAKER_01It really can.
SPEAKER_00Which leaves us with a rather provocative thought from our sources today. We know predictive models can identify risks. If our health agencies are increasingly relying on dashboards that merely predict outcomes, like overdose risks or hospital readmissions, how much of our public health funding is currently being spent treating mere correlations instead of the actual causes?
SPEAKER_01Aaron Powell Yeah. Are we just blindly digging up rusty nails because the detector beeped?